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Preface 



Numerical lattice QCD has seen many important innovations over the years. In this 
course an introduction to some of the basic techniques is provided, emphasizing their 
theoretical foundation rather than their implementation and latest refinements. 

The development of computational strategies in lattice QCD requires physical in- 
sight to be combined with an understanding of modern numerical mathematics and of 
the capabilities of massively parallel computers. When a new method is proposed, it 
should ideally be accompanied by a theoretical analysis that explains why it is expected 
to work out. However, in view of the complexity of the matter, some experimenting is 
often required. The field thus retains a certain empirical character. 

At present numerical lattice QCD is still in a developing phase to some extent. The 
baryon spectrum, for example, remains to be difficult to compute reliably, because the 
signal-to-noise ratio of the associated two-point functions decreases exponentially at 
large distances. There is certainly ample room for improvements and it may also be 
necessary to radically depart from the known techniques in some cases. Hopefully these 
lectures will encourage some of the problems to be studied and to be solved eventually. 
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1 

Computation of quark propagators 



Quark propagators in presence of a specified SU(3) gauge field are fundamental build- 
ing blocks in lattice QCD. Their computation amounts to solving the Dirac equation 



a number of times, where D denotes the massive lattice Dirac operator, r](x) a given 
quark field (the source field) and ip(x) the desired solution. 

Although the subject is nearly as old as lattice QCD itself, there have been impor- 
tant advances in the last few years that allow the quark propagators to be calculated 
much more rapidly than was possible before. As a consequence, the "measurement" 
of hadronic quantities and the simulation of the theory with light sea quarks are both 
accelerated significantly. 

1.1 Preliminaries 

1.1.1 Accuracy & condition number 

The Dirac equation (1.1) is a large linear system that can only be solved iteratively, 
i.e. through some recursive procedure that generates a sequence ipi, ip%, tp3, • ■ • of in- 
creasingly accurate approximate solutions. A practical measure for the accuracy of an 
approximate solution <p is the norm of the associated residue 



If, say, \\p\\ < e || 7/|| for some small value e, an important question is then by how much 
<p deviates from the exact solution tp of the equation. Using standard norm estimates, 
the deviation is found to be bounded by 



Dip{x) = rj(x) 



(1.1) 



p = r/ — Dtp. 



(1.2) 



U-4W <€k{d)U\ 



(1.3) 



where 



k(D) = \\D\\\\D- 1 \ 



(1.4) 



is referred to as the condition number of D. The relative error of tp thus tends to be 
larger than e by the factor k(D) (in this chapter, the standard scalar product of quark 
fields is used as well as the field and operator norms that derive from it). 
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The extremal eigenvalues of D'D, a m i n and a max , are proportional to the square of 
the quark mass m and the square of the inverse of the lattice spacing a, respectively. 
In particular, the condition number 

«(£>) = (a max /a min ) 1/2 oc (amy 1 (1.5) 

can be very large at small quark masses and lattice spacings. One says that the Dirac 
operator is "ill-conditioned" in this case. The important point to keep in mind is that 
the accuracy of the solution of the Dirac equation which can be attained on a given 
computer is limited by the condition number. 

1.1.2 Iterative improvement 

If the residual error e of the approximate solution <j> is still well above the limit set by 
the machine precision and the condition number of the Dirac operator, an improved 
solution 

= + X (1.6) 
may be obtained by approximately solving the residual equation 

D X = p- (1.7) 

It is straightforward to show that the residue of the solution is reduced by the factor 8 
in this way if x satisfies \\p — Dx\\ < S\\p\\. Note that x is approximately equal to the 
deviation of <fi from the exact solution of the Dirac equation and is therefore usually a 
small correction to <j>. 

Iterative improvement is used by all solvers that need to be restarted after a while. 
The GCR algorithm discussed below is an example of such a solver. Another applica- 
tion of iterative improvement, known as "single-precision acceleration", exploits the 
fact that modern processors perform 32 bit arithmetic operations significantly faster 
than 64 bit operations. The idea is to solve the Dirac equation to 64 bit precision by 
going through a few cycles of iterative improvement, where, in each cycle, the residual 
equation is solved to a limited precision using 32 bit arithmetic, while the residue p 
and the sum (1.6) are evaluated using 64 bit arithmetic (Giusti et at, 2003). 

1.2 Krylov-space solvers 

The Krylov space K, n of dimension n is the complex linear space spanned by the fields 

V , D V , D\..., D n -\ (1.8) 

Many popular solvers, including the CG, BiCGstab and GCR algorithms, explicitly 
or implicitly build up a Krylov space and search for the solution of the Dirac equation 
within this space. The very readable book of Saad (2003) describes these solvers in 
full detail. A somewhat simpler discussion of the CG (conjugate gradient) algorithm 
is given in the book of Golub and van Loan (1989) and useful additional references for 
the BiCGstab algorithm are van der Vorst (1992) and Frommer et al. (1994). Here the 
GCR (generalized conjugate residual) algorithm is discussed as a representative case. 
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1.2.1 The GCR algorithm 

The approximate solutions of the Dirac equation (1.1) generated by the GCR algorithm 
are the fields ipk e JCk, k — 1, 2, 3, . . ., that minimize the norm of the residues 

p h =r]-Dip k . (1.9) 

An equivalent requirement is that Dipk coincides with the orthogonal projection of 
the source field r\ to the fc-dimensional linear space DK-k (see Fig. 1.1). The algorithm 
proceeds somewhat indirectly by first constructing an orthonormal basis X0)Xi)X2j • • • 
of these spaces through a recursive process. Independently of the details of the con- 
struction, the orthogonality property mentioned above then implies that the fields 

fc-i 

Pk = V-^2 c iXh ci = (xi,v), (i-io) 

1=0 

are the residues of the approximate solutions ipk . The residues are thus obtained before 
the latter are known. 

In each iteration of the recursion, the next basis field Xfc is constructed from the 
previous fields xoi ■ ■ ■ i Xfc-i an d the source r\. First the residue pk is computed through 
eqn (1.10) and Xk is then taken to be a (properly orthonormalizcd) linear combination 
of Dpk and the previous fields. Since Xfc-i is a linear combination of Dpk-i and the 
fields Xoj • • • j Xk— 2j an d so on, it is clear that the recursion also yields the coefficients 
aij in the equations 

I 

Xi=Y, a l3 D Ph J = 0,1,2,..., (1.11) 

3=0 

in which p§ = r\. 

Once the fields xoi ■ • ■ i Xn-i are known for some n, the last solution ip n is obtained 
starting from the orthogonality condition 

n-l 

Di/j n = J2axi- (1.12) 

After substituting eqn (1.11), the equation may then be divided by D and one finds 
that the solution is given by 

n-l i 

(=0 ]=0 

Note that the right-hand side of this equation can be evaluated straightforwardly since 
all entries are known at this point. 

In total the computation of i/} n requires n applications of the Dirac operator and 
the evaluation of some \r? linear combinations and scalar products of quark fields. 
Moreover, memory space for about 2n fields is needed. Choosing values of n from, say, 
16 to 32 proves to be a reasonable compromise in practice, where the computational 
effort must be balanced against the reduction in the residue that is achieved. If the 
last solution is not sufficiently accurate, the algorithm can then always be restarted 
following the rules of iterative improvement. 
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Fig. 1.1 The approximate solution tpk £ ICfc of the Dirac equation constructed by the GCR 
algorithm is such that the distance \\r) — Dipk |] is minimized. The field Di[>k therefore coincides 
with the orthogonal projection of the source rj to the space DKk- 

1.2.2 Convergence properties 

The convergence of the GCR algorithm and related Krylov-spacc solvers can be proved 
rigorously if the (complex) spectrum of the Dirac operator is contained in the half- 
plane on the right of the imaginary axis. In the case of the Neuberger-Dirac operator, 
for example, all eigenvalues of D lie on the circle shown in Fig. 1.2. The spectrum of 
the Wilson-Dirac operator is rather more complicated, but is usually contained in an 
ellipsoidal region in the right half-plane. 

For simplicity, the Dirac operator is, in the following paragraphs, assumed to be 
diagonalizablc and to have all its eigenvalues in the shaded disk D shown in Fig. 1.2. 
The convergence analysis of the GCR algorithm then starts from the observation that 
the residue p k is given by 

p k = r] - Drpk = p k (D)r], (1.14) 

where Pfc(A) is a polynomial of degree k that satisfies pk(0) — 1. Since the algorithm 
minimizes the residue, it follows that 

=mm||p fc ( J D)r ? || < min ||p fe (£>)IINI, (1-15) 
Pk Pk 

the minimum being taken over all such polynomials. 

Lattice quark fields are large arrays of complex numbers. In this language, the Dirac 
operator D is just a complex square matrix. The assumption that D is diagonalizablc 
then implies the existence of a diagonal matrix A and of an invertible matrix V such 
that D = VAV' 1 . As a consequence 

||p fc (X>)|| = ^(A)^ 1 !! < K (V)\\ Pk (A)\\ (1.16) 

and therefore 

\\p k \\<K(V)max\ Pk (\)\\\ri\\ (1.17) 
for any polynomial p k (A) of degree k satisfying p k (0) = 1. One may, for example, insert 
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Fig. 1.2 The eigenvalues of the Neuberger-Dirac operator with bare quark mass m lie on a 
circle in the complex plane (drawing on the left). For the convergence analysis of the GCR 
algorithm, the spectrum of the Dirac operator is assumed to be contained in a disk D in the 
right-half plane with radius M and distance m > from the origin (drawing on the right). 

in which case the inequality (1.17) leads to the bound 1 

llp fc ||<«(vo(i + ^)~ fc |HI- (i-i9) 

The GCR algorithm thus converges roughly like exp(— km/M) if m/M <C 1. Note that 
the convergence rate m/M ~ 2/k(D) can be quite small in practice. For m = 10 McV 
and M = 2 GcV, for example, the estimate (1.19) suggests that values of k as large 
as 4000 are required for a reduction of the residue by the factor 10~ 10 . 

The GCR algorithm can also be applied to the so-called normal equation 

D^Dip = j] (1.20) 

and to the Dirac equation 

(ij 5 D + fi)i) = T) (1.21) 

in "twisted-mass" QCD. A notable difference with respect to the ordinary Dirac equa- 
tion is that the operators on the left of these equations can be diagonalized through 
unitary transformations. Moreover, their spectra are contained in straight-line seg- 
ments in the complex plane (see Fig. 1.3). Using Chebyshev polynomials in place of 
the power (1.18), the estimate 

\\Pk\\<2e- rk \\v\\ (1-22) 

may be derived in these cases, where r = m/M for the normal equation and r — fx/2M 
for the twisted-mass Dirac equation. 

The convergence of Krylov-space solvers is thus mainly determined by the proper- 
ties of the spectrum of the operator considered. In QCD the fact that the masses of 

A rigorous mathematical result, known as Zarantello's lemma, asserts that it is not possible to 
obtain a more stringent bound by choosing a different polynomial, i.e. the polynomial (1.18) is the 
optimal one. 
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Fig. 1.3 The eigenvalues of the hermitian operator D'D and the twisted-mass Dirac operator 
i'JsD + /it occupy the shaded line segments shown in the left and right drawings, respectively. 

the light quarks are much smaller than the inverse lattice spacing consequently tends 
to slow down the computations enormously. One can do better, however, by exploiting 
specific properties of the Dirac operator. 

1.2.3 Preconditioning 

Preconditioning is a general strategy that allows such properties to be taken into 
account. Let L and R be some invertiblc operators acting on quark fields. Instead of 
the Dirac equation (1.1), one may then consider the so-called preconditioned equation 



Once this equation solved, using a Krylov-space solver for example, the solution of the 
Dirac equation is obtained by setting ip = R<t>- If -D ~ L~ 1 R~ 1 , and if the application 
of L and R to a given quark field is not too time-consuming, the total computer time 
required for the solution of the equation may be significantly reduced in this way. 

In lattice QCD, a widely used preconditioning method for the Wilson-Dirac oper- 
ator is "even-odd preconditioning" . A lattice point x <G Z 4 is referred to as even or odd 
depending on whether the sum of its coordinates x M is even or odd (see Fig. 1.4). If 
the points are ordered such that the even ones come first, the Dirac operator assumes 
the block form 



where D eo , for example, stands for the hopping terms that go from the odd to the even 
sites. The blocks on the diagonal, D ee and D OQ , include the mass term and a Pauli 
term if the theory is 0(a)-improved. Since they do not couple different lattice points, 
they can be easily inverted and it makes sense to consider the preconditioners 



LDRcf) = Lr). 



(1.23) 




(1.24) 




(1.25) 



With this choice, the Dirac operator is block-diagonalizcd, 
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Fig. 1.4 Hyper-cubic lattices may be divided into the sublattices of the even and the odd 
sites (black and white points, respectively). Even-odd preconditioning effectively amounts to 
"integrating out" the quark field on the odd sublattice. 



LDR = 



D 
£>oc 



D = D cc - D CO D~ L D 



(1.26) 



and the solution of eqn (1.23) thus amounts to solving a system in the space of quark 
fields on the even sublattice. The condition number of D is usually less than half the 
one of D and even-odd preconditioning consequently leads to an acceleration of the 
solver by a factor 2 to 3 or so. 

Other prcconditioncrs used in lattice QCD are the successive symmetric overre- 
laxation (SSOR) prcconditioner (Frommcr et al., 1994) and a domain-decomposition 
preconditioncr based on the Schwarz alternating procedure (Liischer, 2004). The latter 
is an example of an "expensive" preconditioncr, whose implementation involves an it- 
erative procedure and is therefore inexact to some extent. Inaccuracies at this level of 
the algorithm are however not propagated to the final results if the GCR solver is used 
for the preconditioned equation (1.23). This algorithm actually always finds the best 
approximation to the solution in the space generated by applying the preconditioner 
to the residues of the previous solutions. For the same reason, the GCR algorithm is 
also safe of rounding errors. 



1.3 Low-mode deflation 

The low modes of the Dirac operator are intimately related to the spontaneous break- 
ing of chiral symmetry and therefore play a special role in QCD. Treating them sepa- 
rately from the other modes seems appropriate from the physical point of view and is 
recommended for technical reasons at small quark masses. 

1.3.1 Textbook deflation 

In the case of the hcrmitian system, 

Ai> = r], A = D^D, (1.27) 



there exists an orthonormal basis of eigenvectors Vk, k = 1, 2, 3, . . ., such that 
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Av k = a k v k , < ai < a 2 < ■ ■ ■ (1.28) 

The action on any quark field ip of the orthonormal projector P to the TV lowest modes 
is then given by 

N 

fe=i 

Since P commutes with A, the linear system (1.27) splits into the decoupled equations 

A||V>|| = v\\, V^i = Ptp, (1-30) 

A±il>±=ri±, i/>± = (l-P)il>, (1.31) 

where Aii = PAP and A± = (1 — P)A(\ — P) are, respectively, referred to as the 
"little operator" and the "deflated operator" . 

If the eigenvectors v\, . . . , wjv are known, and if there are no zero-modes, the solu- 
tion of the little system (1.30) can be obtained exactly through 



N 1 

i'\\=y2—Vk(v k ,v)' (1-32) 

k—l 

The deflated system (1.31), on the other hand, can only be solved iteratively using the 
CG algorithm, for example. With respect to the full system, the associated condition 
number 

K(A X ) = -^—k{A) (1.33) 

OLN+l 

is however reduced and one therefore expects the solver to be accelerated by the factor 
{aN+i/oti) 1 / 2 or so. 

The deflation of the hermitian system (1.27) along these lines is straightforward to 
implement, but the method tends to be limited to small lattices, because the computer 
time required for the calculation of the low eigenvectors grows rapidly with the lattice 
volume. In the past few years, it was nevertheless further developed and improved in 
various directions (for a review, see Wilcox (2007), for example). 

1.3.2 The Banks Casher relation 

In the continuum theory (which is considered here for simplicity), the eigenvalues of 
the Dirac operator D in presence of a given gauge field are of the form m + i\ k , where 
m denotes the quark mass and \ k € R, k = 1, 2, 3, . . ., the eigenvalues of the masslcss 
operator. The associated average spectral density, 

_j oo 

p(A, m ) = -^(i(A-A,)), (1.34) 

k=l 



is conventionally normalized by the space-time volume V so that it has a meaningful 
infinite-volume limit. 
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Fig. 1.5 Expectation value of the eigenvalues of (D^D) 1 ^ 2 below 116 MeV in 0(a)-improved 
two-flavour QCD on a 2L x L 3 lattice at two values of L. Both spectra were obtained at 
lattice spacing a = 0.08 fm and renormalized sea-quark mass m = 26 MeV. From the smaller 
to the larger lattice, the number of modes per MeV increases approximately as predicted by 
the Banks-Casher relation. 

In a now famous paper, Banks and Casher (1980) showed many years ago that the 
density at the origin, 

lim lim lim p(A, m) — — E, (1.35) 

A->0 m->0 V—>oo TT 

is proportional to the quark condensate 



£ = — lim lim (uu) 
m— >0 V— >oc 



u: up-quark field, 



(1.36) 



in the chiral limit. The spontaneous breaking of chiral symmetry in QCD is thus linked 
to the presence of a non-zero density of eigenvalues at the low end of the spectrum of 
the Dirac operator. 

Since the eigenvalues at — m 2 + \\ of D^D are simply related to the eigenvalues 
of D, the Banks-Casher relation (1.35) immediately leads to the estimate 



v{M, m) ~ -ASF, A 2 = M 2 



(1.37) 



for the number of low modes of D'D with eigenvalues ctk < M 2 . If one sets m = 0, 
M = 100 MeV and £ = (250 MeV) 3 , for example, and considers a space-time volume 
of size V = 2L 4 , the mode numbers are estimated to be 21, 106 and 336 for L = 2, 3 
and 4 fm, respectively. As illustrated by Fig. 1.5, the low-mode condensation is readily 
observed in numerical simulations of lattice QCD. 

Since v{M, m) increases proportionally to space-time volume V, an effective defla- 
tion of the Dirac equation requires 0(V) modes to be deflated. The associated com- 
putational effort increases like V 2 and straightforward deflation consequently tends to 
become inefficient or impractical at large volumes. However, as explained in the fol- 
lowing, the F 2 -problcm can be overcome using inexact deflation (Giusti et ai, 2003) 
and domain-decomposed deflation subspaccs (Liischcr, 2007a). 
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1.3.3 Deflation w/o eigenvectors 

Returning to the lattice Dirac equation (1.1), a more general form of deflation will now 
be described, which starts from an unspecified set <pi, . . . , <f>N of N orthonormal quark 
fields. The linear space S spanned by these fields will play the role of the deflation 
subspace, but is not assumed to be an eigenspace of the Dirac operator. 

The action on any quark field ip of the orthogonal projector P to S is given by 

N 

PV = J>fc(0fc,tf)- (1-38) 

k=l 

As before, the restriction PDP of the Dirac operator to S will be referred to as the 
"little Dirac operator" . Its action is encoded in the complex N x N matrix 

Am = (4>k,D<f)t) , k,l = l,...,N, (1.39) 

through 

N 

PDP^ = <t>k A kl (<&, VO ■ (1-40) 

A;, 2 = 1 

A technical assumption made in the following is that A (and thus PDP as an operator 
acting in S) is invertible. 

The form of inexact deflation discussed below is based on the projectors 



P L = 1 - DP{PDP)- 1 P, (1.41) 

P R = 1- P{PDP)~ l PD. (1.42) 
It is not difficult to prove that 

PI = Pl, P 2 r = Pr, (1-43) 

PP L = P R P = (1 - P L )(l - P) = (1 - P)(l - P R ) - 0, (1.44) 

P L D = DP R . (1.45) 



In particular, the operator Pj, projects any quark field to the orthogonal complement 
S 1 - of the deflation subspace. Note, however, that Pl and Pr are not hcrmitian and 
therefore not ordinary orthogonal projectors. 

The Dirac equation (1.1) may now be split into two decoupled equations, 

DV||=r?||, Di/) X =VX, (1-46) 

where 

V>|| = (1 - Pr)^>, V± = Pr1>, (1-47) 

r/|| = (1 - P L )r h v ± = PlV- (1-48) 

The use of a different projector for the splitting of the solution ip and the source r\ is 
entirely consistent in view of the commutator relation (1.45) and merely reflects the 
fact that the Dirac operator is not hermitian. 
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Fig. 1.6 The low modes x( x ) °f the lattice Dirac operator in the free-quark theory are plane 
waves with small momenta p. These can be well approximated in the norm by functions that 
are constant on blocks of lattice points if the block size b satisfies bp <C 1. Efficient deflation 
subspaces can thus be constructed using fields that are discontinuous and therefore far from 
being approximate eigenmodes of the Dirac operator. 

The solution of the little system, 

N 

= PiPDPyiPr) = Y, (<t>i,v) , (1-49) 

k,l=l 

is easily found, but the other equation can only be solved using an iterative procedure 
such as the GCR algorithm. However, the operator on the left of the equation is the 
deflated operator 

D = DP R = P L D(1 - P), (1.50) 

which acts on quark fields in S 1 - and which may have a much smaller condition number 
than D, particularly so at small quark masses. An acceleration of the computation is 
then achieved since the solution is obtained in fewer iterations than in the case of the 
unmodified Dirac equation. 

The condition number k(D) = \\D\\ H-D -1 1 of D depends on the quark mass mainly 
through the factor 

WD^W = ||(1 - P)D~ 1 (1 - P)\\ < ||(1 - P)(Z? t D)- 1 (l - P)|| 1/2 . (1.51) 

It is then quite obvious that k(D) will be much smaller than k(D) if 1 — P effectively 
"projects away" the low modes of D^D, i.e. if they are well approximated by the defla- 
tion subspace. However, contrary to what may be assumed, this requirement is fairly 
weak and does not imply that the deflation subspace must be spanned by approximate 
eigenvectors of DID (see Fig. 1.6). 

1.3.4 Domain-decomposed deflation subspaces 

In the following, a decomposition of the lattice into rectangular blocks A such as the 
one shown in Fig. 1.7 will be considered. A special kind of deflation subspace S is then 
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Fig. 1.7 Two-dimensional view of a 96 x 48 lattice, divided into 8192 non-overlapping blocks 
of size 6 4 . 

obtained by choosing a set of orthonormal fields cj)f,..., </>^ on each block A and by 
taking S to be the linear span of all these fields. The associated projector is 

P = ^P A , P A V = X>fc O^-VO- (1-52) 

A fe=l 

Evidently, this construction fits the general scheme discussed in Section 1.3.3 except 
perhaps for the labeling of the fields that span the deflation subspacc. 

The size of the blocks A is a tunable parameter of domain-decomposed subspaces. 
Usually the blocks are taken to be fairly small (4 4 , 6 4 or 8 x 4 3 , for example), but the 
exact choice should eventually be based on the measured performance of the deflated 
solver. A key feature of domain-decomposed subspaces is the fact that their dimension 
is proportional to the volume V of the lattice, while the application of the projector 
P to a given quark field requires only 0(N s V) floating-point operations (and not 
0(N s V 2 ) operations as would normally be the case). Such subspaces may thus allow 
the y 2 -problem to be overcome, provided high deflation efficiencies can be achieved 
for some volume-independent number N s of block modes. 

In the case of the free-quark theory, Fig. 1.6 suggests that this strategy will work 
out if the block modes are chosen to be constant. Since quark fields have 12 complex 
components, one needs N s = 12 such modes per block. In presence of an arbitrary 
gauge field, the choice of the block modes is less obvious, however, because the notion 
of smoothness ceases to have a well-defined meaning. 

1.3.5 Local coherence &; subspace generation 

It is helpful to note at this point that the deflation deficits 

ll(i - P)xf = E IK 1 - p A)xll 2 (!- 53 ) 

A 

of the low modes \ of D'D can only be small if they are small on all blocks A. Since the 
local deflation subspacc has fixed dimension N s , and since the number of low modes is 
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Fig. 1.8 When restricted to a small block of lattice points, the 0(V) low modes of the Dirac 
operator tend to align to a relatively low-dimensional linear space, a property referred to as 
local coherence. 

proportional to V and thus tends to be much larger than N s , this condition cannot in 
general be met unless the low modes happen to collapse to a lower dimensional space 
on each block, i.e. unless they are "locally coherent" (see Fig. 1.8). 

The local coherence of the low modes is numerically well established. On a 64 x 32 3 
lattice with spacing a = 0.08 fm, for example, deflation deficits as small as a few percent 
are achieved when using 4 4 blocks and N s = 12 block fields. Moreover, N s does not 
need to be adjusted when the lattice volume increases. So far, however, no theoretical 
explanation of why the modes are locally coherent has been given. In particular, it is 
unclear whether the property has anything to do with chiral symmetry. 

Efficient domain-decomposed deflation subspaces can now be constructed fairly 
easily (Liischer, 2007a). One first notes that any quark field ip satisfying ||-D^|| < 
M||?/>|| for some sufficiently small value of M (say, M = 100 MeV) is well approximated 
by a linear combination of low modes and is therefore locally coherent with these. By 
generating a set ip\ , . . . , ^jv 8 of independent fields of this kind, using inverse iteration, 
for example, and by applying the Gram-Schmidt orthonormalization process to the 
projected fields 



one thus obtains a basis (f>^, . . . , (f>^ of block fields with large projections to the low 
modes 2 . The generation of the deflation subspacc along these lines requires a modest 
amount of computer time and certainly far less than would be needed for an approxi- 
mate computation of the low modes of the Dirac operator. Moreover, since N s can be 
held fixed, the computational effort scales like V rather than V 2 . 

1.3.6 Solving the deflated system 

Once the deflation subspace is constructed, the deflated equation Difj± = ij± can in 
principle be solved using the GCR algorithm with D replaced by D = PlD. However, 
the application of the projector Pi requires the little system to be solved for a given 
source field, which is not a small task in general. Note that the little Dirac operator 




if x G A, 
otherwise, 



(1.54) 



2 The deflation subspace can alternatively be generated "on the fly" while solving the Dirac equa- 
tion, exploiting the fact that the residue of an approximate solution tends to align to the low modes 
of the Dirac operator (Brannick et al, 2008; Clark et al, 2008 a). 
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Fig. 1.9 Computer time needed for the solution of the 0(a)-improved Wilson-Dirac equation 
in two-flavour QCD on a 64 x 32 3 lattice with spacing a = 0.08 fm. In these tests, the sea-quark 
mass m sea was 26 MeV, the valence-quark mass m va i ranged from about 15 to 90 MeV and 
the relative residue of the solution was required to be 10~ 10 . All timings were taken on a PC 
cluster with 64 (single-core) processors. 

acts on fields on the block lattice with N s complex components. An exact solution of 
the little system is therefore not practical on large lattices. 

In the case of the Wilson-Dirac operator and its relatives, the little Dirac opera- 
tor has only nearest-neighbour couplings among the blocks. The solution of the little 
system may then be obtained iteratively using the even-odd preconditioned GCR al- 
gorithm, for example. The effort required for the solution of the little system is nev- 
ertheless not completely negligible and it is advisable to consider solving the deflated 
right-preconditioned system 

P L DRcf> = r] ± , ipj_ = PrR4>, (1.55) 

instead of the deflated equation directly, the operator R being a suitable preconditioner 
for D. A preconditioner that has been used in this context is the Schwarz alternating 
procedure (Liischer, 2004). The important point to note is that the preconditioner 
tends to reduce the high-mode components of the residue of the current approximate 
solution, while the low-mode component of the residue is projected away by the pro- 
jector Pl. Deflation and right-preconditioning thus tend to complement one another. 

The performance figures plotted in Fig. 1.9 show that local deflation works very 
well in lattice QCD. In this study, the block size was taken to be 4 4 and N s was set to 
20. With respect to the even-odd preconditioned BiCGstab algorithm (points labeled 
EO+BiCGstab in the figure), the deflated Schwarz-prcconditioned GCR algorithm 
(DFL+SAP+GCR) achieves an acceleration by more than an order of magnitude at 
the smallest quark masses considered. 
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On other lattices, the deflated Schwarz-preconditioned solver for the 0(a)-improved 
Wilson-Dirac equation performs as well as in the case reported in Fig. 1.9. In partic- 
ular, the accumulated experience unambiguously shows that the algorithm overcomes 
the y 2 -problem and that the critical slowing down towards the chiral limit, which 
previously hampered quark-propagator computations, is nearly eliminated. 



2 

Simulation algorithms 



Lattice QCD simulations are based on Markov chains and the concept of importance 
sampling. More specifically, most large-scale simulations performed today rely on some 
variant of the so-called Hybrid Monte Carlo algorithm (Duanc et at, 1987). An ex- 
ception to this rule are simulations of the pure SU(3) gauge theory, where link-update 
algorithms are usually preferred for reasons of efficiency. 

QCD simulation algorithms have a long history and incorporate many ideas and 
improvements. Some important deficits remain, however, and further progress in al- 
gorithms will no doubt be required to be able to perform accurate simulations of a 
significantly wider range of lattices than is possible at present. 

2.1 Importance sampling 

2.1.1 Statistical interpretation of the functional integral 

The quark fields in the QCD functional integral take values in a Grassmann algebra. So 
far no practical method has been devised that would allow such fields to be simulated 
directly. The theory simulated is therefore always the one obtained after integrating 
out the quark fields. 

In the case of two-flavour lattice QCD with mass-degenerate Wilson quarks, the 
partition function then assumes the form 



where D(U) denotes the massive Wilson-Dirac operator in presence of the gauge field 
U, S g (U) the gauge action and dU(x,fi) the SU(3)-invariant integration measure for 
the link variable U(x,/-i). Since D< = 75-D75, the quark determinant detZ? is real and 
the product 



is therefore a normalized probability density on the space of all gauge fields. The 
physics described by the theory is eventually extracted from expectation values 





(2.1) 




(2.2) 




(2.3) 



of observables 0(U) such as a Wilson loop or a quark-line diagram. From this point of 
view, lattice QCD thus looks like a classical statistical system, where the states (the 



Importance sampling 17 



gauge-field configurations) occur with a certain probability and where one is interested 
in the expectation values of some properties of the states. 

The simulation algorithms discussed in the following depend on the existence of 
a probabilistic representation of the theory. In particular, they do not apply to two- 
flavour QCD with non-degenerate quark masses and three-flavour QCD unless the 
product of the quark determinants is guaranteed to be non-negative (as is the case if 
the lattice Dirac operator preserves chiral symmetry). 

2.1.2 Representative ensembles 

Representative ensembles {U±, . . . , Ujy} of gauge fields are obtained by choosing the 
fields randomly with probability D[U]p(U), i.e. such that 

no. of fields in <K = / D[U] p(U) + 0(A^ 1/2 ) (2.4) 

for any open region $K in field space, the term of order N~ x / 2 being a statistical error 
that depends on 9\ and the generated ensemble of fields. The high-probability regions 
thus contain many fields Ui and are therefore sampled well, while in other areas of 
field space there may be only a few fields or none at all (if N is not astronomically 
large). 

Given a representative ensemble of fields, the expectation values of the obscrvables 
of interest can be estimated through 

1 - 

{0) = -Y.°^) + °^ 1/2 )- ( 2 - 5 ) 

i=l 

A bit surprising may be the fact that results with small statistical errors can often be 
obtained in this way even if the ensemble contains only 100 or perhaps 1000 field con- 
figurations. Naive estimates, taking the dimension of field space into account, actually 
suggest that an accurate numerical evaluation of the functional integral requires some 
fc 32 ™ configurations, where n is the number of lattice points and k at least 10 or so. 

The apparent paradox is resolved by noting that small ensembles of field configu- 
rations can only capture some aspects of the theory. That is, one should not expect to 
obtain the expectation values of all possible obscrvables with small statistical errors. 
Typically any quantity sensitive to the correlations of the field variables at large dis- 
tances tends to have large statistical errors, sometimes to the extent that the results 
of the computation are completely useless. 

2.1.3 Translation symmetry and the infinite-volume limit 

In order to minimize finite-volume effects, periodic boundary conditions are usually 
imposed on all fields, an exception being the quark fields, which are often taken to be 
anti-periodic in time. Since the translation symmetry of the theory is preserved by this 
choice of boundary conditions, representative ensembles of gauge fields are expected 
"to look the same" in distant regions of a large lattice. 
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The meaning of this statement is best explained by dividing the lattice into blocks 
A, as in Section 1.3.4, and by considering an extensive quantity 

£ = Y,0{x), (2.6) 

X 

where 0(x) is some local gauge- invariant field. Translation symmetry then implies 
that the contributions (0\) to the sum 

(£>=£<£>A>, A = ^0(x), (2.7) 

A ieA 

are all equal. Moreover, the statistical fluctuations of 0\ and 0\' are practically 
uncorrelated and tend to cancel one another in the sum (2.7) if the blocks A and A' 
are separated by a distance larger than the range of the connected correlation function 
of 0(x). For a fixed ensemble size, the statistical error of the density (£)/V therefore 
decreases like V" 1 ! 2 for V oo, i.e. the statistics is effectively multiplied by a factor 
proportional to V . 

The discussion also illustrates the fact that the efficiency of importance sampling 
depends on the observable considered and on how its expectation value is calculated. 
Rather than from eqn (2.6), one might actually start from the identity (£) = V(O(0)), 
in which case the information contained in the field ensemble away from the origin 
x = remains unused. Both calculations yield the correct expectation value, but only 
the first profits from the available data on the full lattice and consequently obtains 
the result with much better statistical precision. 

2.1.4 Simulating gaussian distributions 

Representative ensembles of fields can be easily constructed in the case of a complex 
field 4>{x) distributed according to the gaussian probability density 

p A {4>) oc exp{-^2<l>{x^(Axf,)(x)}, (2.8) 

X 

where A is a hcrmitian, strictly positive linear operator. If one sets 

A = —A + mg, A: lattice laplacian, (2-9) 

for example, the theory describes a free scalar field with bare mass mo. Another pos- 
sible choice of A is 

A = (DD^y 1 , D: lattice Dirac operator. (2.10) 

The field 4>(x) must be a pseudo-fermion field in this case, i.e. a complex-valued and 
therefore bosonic quark field (see Section 2.5.1). 
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A representative ensemble of fields 4>i,- ■ - ,4>n may be generated for any gaussian 
distribution by randomly choosing a set xi, ■ • ■ , Xn of fields with normal distribution 
Pi(x) and by setting 

fa{x) = (B Xi )(x), i=l,...,N, (2.11) 
where B is an operator satisfying 

A={BB^)-\ (2.12) 

In the case of the pseudo-fermion fields, for example, one can simply take B = D, 
while a possible choice in the case of the free scalar field is 

B = (-A + mly 1/2 . (2.13) 

It is then straightforward to check that the fields fa generated in this way are correctly 
distributed. 

Since the fields Xi{ x ) arc normally distributed, their components at different lattice 
points are decoupled and can be drawn randomly one after another. The generation of 
random numbers on a computer is a complicated subject, however, with many open 
ends and a vast literature (for an introduction, see Knuth (1997), for example). For the 
time being, one of the simulation-quality random number generators included in the 
GNU Scientific Library (http://www.gnu.org/software/gsl) may be used, among 
them the ranlux generator (Luscher, 1994; James, 1994), which is based on a strongly 
chaotic dynamical system and thus comes with some theoretical understanding of why 
the generated numbers are random. An efficient ISO C code for this generator can be 
downloaded from http://cern.ch/luscher/ranlux. 

2.2 Markov chains 

Gaussian distributions are an exceptionally simple case where representative ensembles 
of fields can be generated instantaneously In general, however, representative ensem- 
bles are generated through some recursive procedure (a Markov process) which obtains 
the field configurations one after another according to some stochastic algorithm. 

This section is devoted to a theoretical discussion of such Markov processes. Rather 
than QCD or the SU(3) gauge theory, an abstract discrete system will be considered 
in order to avoid some technical complications, which might obscure the mechanism 
on which Markov-chain simulations are based. The discrete system is left unspecified, 
but is assumed to have the following properties: 

(a) There is a finite number n of states s. 

(b) The equilibrium distribution P(s) satisfies P(s) > for all s 
and £ s P(s) = l. 

(c) The observables are real-valued functions 0(s) of the states s. 
One is then interested in calculating the expectation values 

(O) = T, s O(s)P(s) (2.14) 



of the observables O(s). 
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2.2.1 Transition probabilities 

A Markov chain is a random sequence s±, S2, S3, ... , sn of states, where Sk is obtained 
from Sfc_i through some stochastic algorithm. The chain thus depends on the initial 
state si and the transition probability T(s — > s') to go from the current state s to 
the next state s'. In the following, the basic idea is to choose the latter such that the 
Markov chain provides a representative ensemble of states for large N. The expectation 
value of any observable O(s) is then given by 



where the error term is dominated by the random fluctuations of the chain. 

When trying to construct such transition probabilities, one may be guided by the 
following plausible requirements: 

1. T(s -> s') > for all s, s' and J2 S > T(s -> s') = 1 for all s. 

2 - E s P(s)T(s -> s') = P(s') for all s' . 

3. T{s -> s) > for all s. 

4. If S is a non-empty proper subset of states, there exist two states 
s e S and s' <£ S such that T(s s') > 0. 

Property 1 merely guarantees that T(s — > s') is a probability distribution in s' for any 
fixed s, while property 2 says that the equilibrium distribution should be preserved 
by the update process. The other properties ensure that the Markov process does not 
get trapped in cycles (property 3, referred to as "aperiodicity" ) or in subsets of states 
(property 4, "ergodicity"). 

Later it will be shown that any transition probability T(s — > ,s') satisfying 1 — 4 
generates Markov chains that simulate the system in the way explained above. These 
properties are thus sufficient to guarantee the correctness of the procedure. 

2.2.2 The acceptance-rejection method 

For illustration, an explicit example of a valid transition probability will now be con- 
structed. In order to simplify the notation a little bit, the states are labeled from to 
n — 1 and are thought to be arranged on a circle so that the neighbours of the state i 
are the states i± 1 mod n. The construction then starts from the transition probability 



which generates a random walk on the circle. This transition probability satisfies 1—4, 
the equilibrium distribution being the flat distribution Po(i) — 1/n. 

Now if the equilibrium distribution is not flat, as the one shown in Fig. 2.1, a valid 
transition probability is given by (Metropolis et ai, 1953) 



T(i ->■ j) = T (i ->■ i)Pacc(^i) + &jE*zb(* fc ) - p ^ l > fc )) ' ( 2 - 17 ) 




(2.15) 




if j = iorj = i±l mod n, 
otherwise, 



(2.16) 
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probability 



Fig. 2.1 Example of an equilibrium probability distribution of the abstract discrete system 
considered in this section. The update algorithm implementing the transition probability 
(2.17) generates a random walk in the space of states, where, in each step, one moves from 
the current state i to one of its neighbours j with probability |P aC c(i, j) or else stays at i. 

where 

P^ c (i,j)=min{l,P(j)/P(i)} (2.18) 

is the so-called acceptance probability. In other words, starting from the current state 
i, the next state j is proposed with probability Tq(i — > j). A random number r £ [0, 1] 
is then chosen, with uniform distribution, and j is accepted if P(J) > rP(i). If the 
proposed state is not accepted, the next state is taken to be the state i. 

The acceptance-rejection method is widely used in various incarnations. In general, 
the challenge is to find an a priori transition probability To(i — > j) where the proposed 
states are accepted with high probability, as otherwise the simulation will be very slow 
and therefore inefficient. The transition probability (2.17) incidentally satisfies 

P(i)T(i -> j) = P(j)T(j -> i) for all (2.19) 

a property referred to as "detailed balance". In the literature, detailed balance is 
sometimes required for acceptable algorithms, but the condition is quite strong and 
not needed to ensure the correctness of the simulation. 

2.2.3 Implications of properties 14 

In the following, it is assumed that T(s — > s') is a given transition probability satisfy- 
ing the conditions 1-4 listed in Section 2.2.1. The associated Markov process is then 
shown to have certain mathematical properties, which will allow, in Section 2.2.4, to 
determine its asymptotic behaviour at large times. 

Let T-L be the linear space of real- valued functions f(s) defined on the set of all 
states s. A useful norm of such functions is given by 

l|/||i = EJ/(*)|. (2.20) 

The transition probability T(s — > s') defines a linear operator T in % through 

{Tf){s') = Y JS f{s)T{s^s'). (2.21) 

Note that, by properties 1 and 2, the equilibrium distribution P(s) has unit norm and 
is an eigenfunction of T with eigenvalue 1. 
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Lemma 2.1 For all f e H, the bound ||T/||i < ||/||i holds. Moreover, if T f = /, 
there exists c € M such that f(s) = cP(s) for all states s. 

Proof: Any given function / G % may be decomposed into positive and negative parts 
according to 

/(«) = /+(*)-/_(*), f±(s) = HI/001 ± /(*)} > 0. (2.22) 
Property 1 then implies 

l|77±lli = E fl , |E S /±(*)T( S -> S ')| = Z s f±(s)E s .T(s »') = H/illr. (2.23) 
Using the triangle inequality, this leads to 
||T/|| 1 = ||T/ + -T/_|| 1 

< IITZ+Hr + ||T/_|| X = HZ+llx + ||/- ||i = H/lli, (2.24) 

which proves the first statement made in the lemma. Moreover, eqn (2.24) shows that 
the equality ||T/||i = ||/||i holds if and only if Tf + and T/_ have disjoint support. 

Now let / £ H be such that Tf = f. The functions T/+ and T/_ must have disjoint 
support in this case and are both non-negative. Since the decomposition Tf = /+ — /_ 
in positive and negative parts is unique, it follows that Tf + = / + and Tf— = /_. 
Properties 3 and 4 however imply that the support of a non-negative function grows 
when T is applied, unless the support is empty or the whole set of states. Either /+ 
or /_ must therefore be equal to zero. The function / thus has a definite sign. 

Another function g e % may now be defined through 

g(s) = f(s)-cP{s), c =£./(*)• (2.25) 

This function must also have a definite sign since Tg — g. Moreover, £ s g(s) = by 
construction, which can only be true if g = 0, i.e. if f(s) = cP(s). D 

Lemma 2.1 shows that the equilibrium distribution is the only distribution that is 
stationary under the action of the operator T. Some control over the complementary 
space 

Ho = {/ G H | £ s /(s) = 0} (2.26) 

of non-stationary functions will later be required as well. To this end, it is helpful to 
introduce a funny scalar product in H, 

(f,9) = E s mP(s)- 1 g(s), (2.27) 
and the associated norm ||/|| = (/, f) 1 ^ 2 . 

Lemma 2.2 There exists < p < 1 such that \\Tf\\ < p\\f\\ for all f £ Hq. 



Markov chains 23 



Proof: With respect to the scalar product (2.27), the adjoint of the operator T 
may be defined and thus also the symmetric operator T = T^T . The action of T on 
any function / e % is given by 

(Tf)(s') = E s mf(s s'), (2.28) 
r(« *') = Er T ( s ->■ rJPfr)-^^ ->■ r)P(s')- (2.29) 

A moment of thought then shows that T(s s') is a transition probability satisfying 
conditions 1-4. In particular, Lemma 2.1 holds for T as well. 

Since T is symmetric, there exists a complete set of eigenfunctions u, € "H of T 
with real eigenvalues A^ (i = 0, 1, . . . , n — 1). Without loss one may assume that 

A > Ai > ... > A„_i, {v i ,v j )=5 ij . (2.30) 

Moreover, noting A; = (vi,Tvi) = |jT«i|| 2 > 0, lemma 2.1 (for T) implies 

A = 1, v = P and Ai < 1. (2.31) 

Now since the states Vi, i > 1, satisfy 

0=(«o,«i)=E.«i(«), ( 2 -32) 

they form a basis of the subspace Ho, i.e. T maps "Ho m to itself and its largest eigen- 
value in this subspace is Ai. As a consequence, 

\\Tf\\ 2 = (f,ff)< Aill/H 2 for all /£«„. (2.33) 

This proves the lemma and also shows that the smallest possible value of the constant 
p is A/ . U 

2.2.4 Statistical properties of Markov chains 

When analyzing Markov chains, an important conceptual point to note is that one 
cannot reasonably speak of the statistical properties of a single chain. However, one 
can ask what the average properties of the generated sequences of states are if many 
independent chains are considered. In practice, this corresponds to running several 
simulations in parallel, with different streams of random numbers. 

In the following theoretical discussion, it will be assumed that an infinite number 
of independent simulations of length N have been performed, with the same initial 
state si and a transition probability T(s —> s') that has properties 1-4. As before, the 
expectation value of an observable O will be denoted by (O) , while 

1 N 

°=nT,°^) (2-34) 

fc=i 

stands for its average over the states si, S2, ■ ■ ■ , Sjv generated in the course of a given 
simulation. One may then also consider functions </>(si, . . . , sjv) of these sequences of 
states and their average ((</>)) over the infinitely many parallel simulations. 
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(a) Probability distribution of the states. The state s k generated after k — 1 steps 
changes randomly from one simulation to another. It is not difficult to show that the 
probability P&(s) = ((8 SSk )} for s k to coincide with s is given by 

P k {s) = J2 T ( s i -> s 2)T(s 2 -► s 3 ) • • • T(s k ^ -> s) 

S2,S3,...,Sjt_l 

= (T fe - 1 P )( S ), P (s)=<W (2-35) 
Noting Po = P + f, f £ Ho, property 2 and lemma 2.2 then imply 

Pk{s) = P(s) + 0(c- k / T ), (2.36) 

k— >oo 

where t = — 1/ In p > is the so-called exponential autocorrelation time of the Markov 
process. The state s k is thus distributed according to the equilibrium distribution if k 
is much larger than r. In particular, after so many steps, there is no memory of the 
initial state S\ anymore and one says that the simulation has "thermalized" . 

(b) Calculation of expectation values. Together with the generated states, the average 
O of the "measured" values of an observable O fluctuates randomly about the mean 
value 

JV N 

«°» = jvE«°( s *)» =E°( s )]v^ Pfc(s) - (2 - 37) 

k=l s k=l 

Recalling eqn (2.36), this formula shows that 

«£>» = (O) (2.38) 

if the first fc»T measurements of O are dropped. Averages of the measured values 
calculated after thcrmalization thus coincide with the expectation value (O) up to 
statistical fluctuations. 

(c) Autocorrelation functions. The states Sk in a Markov chain are statistically de- 
pendent to some extent, because they are generated one after another according to 
the transition probability T(s s'). As a consequence, the measured values of an 
observable O are statistically correlated, i.e. the autocorrelation function 

r(t) = ((0(s k )0(s k+t )}) ((0(s k )))((0( Sk+t ))) (2.39) 

does not vanish. For k 3> r and t > 0, the autocorrelation function is independent of 
k and given by 

r(f) = ]T P(s k )0{s k )T( Sk -> s fc+ i) . . . T( Sfe+t _ 1 -> Sfc+t )0(s fc+f ) - (O) 2 . 

(2.40) 

Moreover, noting P(s)0(s) = P(s)(0) + f(s), f € Ho, it follows from this expression 
and lemma 2.2 that T(t) falls off exponentially, like e~*/ r , at large separations t. The 
measured values 0(sj) and O(sj) are thus independently distributed if \i — j\ ^> r. 
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(d) Statistical fluctuations. The statistical variance of the averages O of an observable 
O is, after thermalization, given by 

1 N 9t 
«(° - <°» 2 » = N* E - *'D = r (°)lf + °( N ~*)> ( 2 - 41 ) 

1,3=1 



where 



9 ^r(o) 1 J 



TO 

2 



denotes the so-called integrated autocorrelation time of O. Up to terms of order N 3 / 2 , 
the standard deviation of O from its expectation value (O) is thus 



1/2 

a = (Tq 



(^) 17 ', a o = ((0-(0))y/ 2 , (2.43) 



i.e. the statistical error of O decreases proportionally to N~ x / 2 . Note that <jq is just the 
standard deviation of O in equilibrium. In particular, oo is a property of the system 
rather than of the Markov process. The integrated autocorrelation time to, on the 
other hand, often strongly depends on the simulation algorithm. 

The fact that the measured values of O are correlated leads to an increase of the 
variance of O by the factor 2to and thus lowers the efficiency of the simulation. In 
practice one frequently chooses to measure the observables only on a subsequence of 
states separated by some fixed distance At in simulation time. As long as At is not 
much larger than 2to, the depletion of the measurements has no or little influence on 
the statistical error of O and therefore helps reducing the computational load. 

2.3 Simulating the SU(3) gauge theory 

The theory of Markov chains developed in Section 2.2 can be extended to non-discrete 
systems like the pure SU(3) gauge theory on a finite lattice. Markov chains are se- 
quences U\, U2, ■ ■ ■ , Un of gauge-field configurations in this case, which are generated 
according to some transition probability. The latter must satisfy certain conditions 
analogous to those listed in Section 2.2.1 for the discrete system. The mathematics 
required at this point however tends to be quite heavy and no attempt will be made 
to prove of the correctness of the procedure (see Tierney (1994), for example). 

2.3.1 Transition probability densities 

The equilibrium probability density to be simulated is 

p(JJ) = L e SAU)^ z = J D [[r] e -S 8 (!') ) (2.44) 

where the gauge action S g (U) is assumed to be a bounded function of the gauge field 
U. Markov processes in this theory are characterized by a transition probability density 
T(U -> U') that specifies the probability D[U'] T(U — > U') for the next configuration 
to be in the volume element D[t/'] at U' when the current configuration is U. 
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Note that transition probability densities may involve (5-function and other singu- 
larities, but the product D[U'} T(U — > U') must be a well-defined measure on the field 
manifold for any given U . The obvious requirements are then 



Further conditions need to be added, however, in order to guarantee the aperiodicity, 
the ergodicity and thus the convergence of the Markov process. A sufficient but fairly 
strong condition is 

3. Every gauge field V has an open neighbourhood AT in Geld space 
such that T(U ~^U')>e for some e > and all U, U' € Af. 

This property ensures that, in every step, the Markov process spreads out in an open 
neighbourhood of the current field. Moreover, using the compactness of the field man- 
ifold, it is possible to show that the process will reach any region in field space in a 
finite number of steps. 

While properties 1-3 guarantee the asymptotic correctness of the simulations, the 
rigorous upper bounds on the exponential autocorrelation time obtained in the course 
of the convergence proofs tend to be astronomically large. In practice, simulations of 
lattice QCD therefore remain an empirical science to some extent, where one cannot 
claim, with absolute certainty, that the simulation results are statistically correct. 

2.3.2 Link-update algorithms 

If T\ and T2 are two transition probability densities satisfying 1 and 2, so does their 
composition 



An update step according to the composed transition probability density first obtains 
the intermediate field V with probability D[V] Ti(U — > V) and then generates U' with 
probability D[U'] T2(V — > U'). Composition allows simulation algorithms for fields to 
be constructed from elementary transitions, where a single field variable is changed at 
the time. 

Link-update algorithms generate the next gauge field by updating the link variables 
one after another in some order. The Metropolis algorithm, for example, proceeds as 
follows (the SU(3) notation is summarized in Section 2.3.6): 

(a) Select a link (x,p) and choose X <E su(3) randomly in the ball \\X\\ < e, 
with uniform distribution, where e is some fixed positive number. 

(b) Accept U'(x, /i) = e x U(x, /i) as the new value of the link variable on 
the selected link with probability P acc = min{l, e Sg ^ c/ ^ _s ' E ^ c/ ">}. 

(c) Leave the link variable unchanged if the new value proposed in step (b) 
is not accepted. 



1. T(U -^U')>0 for all U, U' and J B[U'] T(U ->• U') = 1 for all U. 

2. / D[U] p(U)T(U -> U') = p(U') for all U'. 




(2.45) 
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It is not difficult to write down the transition probability density corresponding to the 
steps (a)-(c) and to check that it satisfies conditions 1 and 2. Moreover, the complete 
update cycle, where each link is visited once, satisfies condition 3 as well. 

If the gauge action is local, the calculation of the action difference S g (U') — S g (U) 
in step (b) involves only the field variables residing in the vicinity of the selected link. 
The computer time required per link update is then quite small. When all factors are 
taken into account, including the autocorrelation times, a more efficient link-update 
algorithm is however provided by the combination of the heatbath algorithm discussed 
below with a number of microcanonical moves (Section 2.3.5). 

2.3.3 Heatbath algorithm 

As a function of the field variable U(x,/j,) residing on a given link (x,fx), the Wilson 
plaquette action is of the form 

S e (U) = -Retr{U(x, fx)M(x, fx)} + ..., (2.46) 

where M(x, fi) and the terms represented by the ellipsis do not depend on U(x, /i). Up 
to a constant factor involving the bare gauge coupling go, the complex 3x3 matrix 



x-\-v 



90 



(2.47) 



X+jl 



coincides with the "staple sum" of Wilson lines from x + /t to x. Other popular gauge 
actions, including the 0(a 2 )-improvcd Symanzik action, are of the same form except 
that M(x, fi) gets replaced by a sum of more complicated Wilson lines. 

The heatbath algorithm (Creutz, 1980) is a link-update algorithm, where the new 
value U'(x, /i) of the field variable on the selected link is chosen randomly with prob- 
ability density proportional to exp (Retr{Z7'(ar, fj,)M(x, /J,)})- In other words, the link 
variable is updated according to its exact distribution in the presence of the other field 
variables. 

While this algorithm fulfills conditions 1-3, it is difficult to implement in practice 
exactly as described here. However, for gauge group SU(2) (the case considered by 
Creutz (1980)), the situation is more favourable and there are highly efficient ways to 
generate random link variables with the required probability distribution (Fabricius 
and Haan, 1984; Kennedy and Pendleton, 1985). 

2.3.4 The Cabibbo Marinari method 

The practical difficulties encountered when implementing the heatbath algorithm in 
the SU(3) theory can be bypassed as follows (Cabibbo and Marinari, 1982). Let v E 
SU(2) be embedded in SU(3) through 

'Vu Vi2 0\ 

V = | V2i v 2 2 . (2.48) 
1/ 
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A correct one-link update move, U(x,[i) — > U'(x, /i), is then obtained by choosing w 
randomly with probability density proportional to exp (Retr{VU(x, fx)M (x, /i)}) and 
by setting U'(x, /j.) = VU(x, /i). A moment of thought reveals that the distribution of 
v is of the same analytic form as the link distribution in the SU(2) theory. The highly 
efficient methods developed for the latter can thus be used here too. 

In order to treat all colour components of the link variables democratically, different 
embeddings of SU(2) in SU(3) should be used. A popular choice is to perform SU(2) 
rotations in the (1, 2), (2, 3) and (3, 1) planes in colour space before proceeding to the 
next link. The ergodicity of the algorithm is then again guaranteed. 

2.3.5 Microcanonical moves 

The link-update algorithms discussed so far tend to become inefficient when the lattice 
spacing a is reduced, a phenomenon known as "critical slowing down" . Typically the 
autocorrelation times of physical observables increase approximately like a -2 . 

Microcanonical algorithms are based on field transformations that preserve the 
gauge action. Such transformations are valid transitions, satisfying conditions 1 and 2, 
provided the field integration measure is preserved too. The trajectories in field space 
generated by a microcanonical algorithm do not involve random changes of direction 
and are therefore quite different from the random walks performed by the other algo- 
rithms. An acceleration of the simulation is then often achieved when microcanonical 
moves are included in the update scheme. 

A microcanonical link-update algorithm for the SU(3) theory is easily constructed 
following the steps taken in the case of the heatbath algorithm (Creutz, 1987; Brown 
and Woch, 1987). The link variable U(x,/j,) on the selected link is again updated by 
applying Cabibbo-Marinari rotations, but the SU(2) matrix v is now set to 

v = . , (2.49) 

where w is a 2 x 2 matrix implicitly defined by 

Reti{VU(x, n)M(x, (j,)} = trjW} + . . . (2.50) 

and the requirement that w is in SU(2) up to a real scale factor. The existence and 
uniqueness of w is implied by the reality and linearity properties of the expression on 
the left of eqn (2.50) (no update is performed if w is accidentally equal to zero). 

The transformation U (x, fx) — > VU (x, /z) defined in this way preserves the gauge 
action since tv{vw^} = tr^t}, but it is less obvious that it also preserves the link inte- 
gration measure, because w and therefore v depend on U(x, fj,), i.e. the transformation 
is non-linear. It is straightforward to show, however, that 

w \u(x,^zu(x,^ = wz1 ( 2 - 51 ) 

for all z £ SU(2). Now if f(U) is any integrable function of U = U(x, //), the substitu- 
tion U — > ZU leads to the identity 

dUf(VU) = f dUf(VU), v = 7-7— f-T- (2-52) 
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Since this equation holds for any z, it remains valid when integrated over SU(2). Using 
the invariance of the group integration measures under left- and right-multiplications, 
one then deduces that 

dUf(VU) = f dz fdUf(VU) = ( dz f dUf(Z^U) = f dUf(U), (2.53) 



which proves that the transformation U — > VU preserves the link integration measure. 

Microcanonical simulation algorithms are not ergodic and must therefore be com- 
bined with an ergodic one. A recommended scheme consists in updating all link vari- 
ables once using the heatbath algorithm and subsequently n times using microcanon- 
ical moves. This combination is more efficient than the pure Metropolis or heatbath 
algorithm and it reportedly has an improved scaling behaviour as a function of the 
lattice spacing if n is scaled roughly like a -1 . 

2.3.6 Appendix: SU(3) notation 

The Lie algebra su(3) of SU(3) consists of all complex 3x3 matrices A that satisfy 

A f = -X and tr{A} = 0. (2.54) 

With this convention, the Lie bracket [A, Y] maps any pair X, Y of su(3) matrices to 
another element of su(3). Moreover, the exponential series 

1 vk 

e* = l+£ir (2.55) 

k=l 

converges to an element of SU(3) (note the absence of factors of i in these and the 
following formulae). 

One can always choose a basis T a , a = 1, . . . , 8, of su(3) such that 

tr | T a T 6| = _l,ja&. ( 2 5 g) 

With respect to such a basis, the elements A <S su(3) are represented as 

8 

A = ^A a T a , A a = -2tr{AT a } e R. (2.57) 

a=l 

Moreover, the natural scalar product on su(3) is given by 

8 

(A, Y)=^2 X a Y a = -2 tr{AF}, (2.58) 



the associated matrix norm being ||A|| = (A, A) 1 / 2 . If not specified otherwise, the 
Einstein summation convention is used for group indices. 



30 Simulation algorithms 



2.4 The Hybrid Monte Carlo (HMC) algorithm 

The inclusion of the sea quarks in the simulations is difficult, because the quark de- 
terminants in the QCD functional integral depend non-locally on the gauge held. In 
particular, one-link update algorithms would require a computational effort propor- 
tional to the square of the lattice volume and are therefore not practical. 

The HMC algorithm (Duane et at, 1987) updates all link variables at once and 
has a much better scaling behaviour with respect to the lattice volume. It will here be 
explained in general terms for an unspecified (possibly non-local) action S(U), which 
is assumed to be real and diffcrcntiable. 

2.4.1 Molecular dynamics 

As an intermediate device, the HMC algorithm requires an su(3)-valued field 

n(x, (j,) = n a (x, fi)T a 7 TT a (x 7 fj.) e K, (2.59) 

to be added to the theory (the SU(3) notation is as in Section 2.3.6). The new field 
is interpreted as the canonical momentum of the gauge field, the associated Hamilton 
function being 

H(n,U) = !(7r,7r) + S(U), (tt.tt) = ^7r°(a;,M)7r a (a:,At). (2.60) 
Evidently, since 

J D[U]0(U)e- s{u) = constant x J D[tt}I>[U]0(U) e~ H ^' u \ (2.61) 

the addition of the momentum field does not affect the physics content of the theory. 

In the form (2.61), the theory is reminiscent of the classical statistical systems that 
describe a gas of molecules. Hamilton's equations 1 , 

(2.62) 

U(x, /i) = ir(x, h)U{x, h), (2.63) 

are therefore often referred to as the "molecular-dynamics equations". As usual, the 
dot on the left of these equations implies a differentiation with respect to time t, which 
is here a fictitious time unrelated to the time coordinate of space-time. The solutions 
of the molecular-dynamics equations, n t (x,[i) and U t {x,[i), are uniquely determined 
by the initial values of the fields at t = 0. They may be visualized as trajectories in 
field space (or, more precisely, in phase space) parameterized by the time t. 

1 The force F(x,/i) informally coincides with 8S(U)/dU(x, fi). Derivatives with respect to the link 
variables however need to be properly denned. According to cqn (2.62), the force field is obtained 
by substituting U(x,fi) — * cxp{aj a (x, fi)T a }U(x, fi), differentiating with respect to the real variables 
u} a (x, fi) and setting uj a (x, /i) = at the end of the calculation. 



Tr(x,fj,) = -F(x,n), F a (x,n) 



dSjc^U) 
duj a (x, jj) 
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2.4.2 The HMC strategy 

The basic idea underlying the HMC algorithm is to pass from the original theory to the 
classical system (2.61) and to evolve the fields by integrating the molecular-dynamics 
equations. Explicitly, the steps leading from the current gauge field U{x 1 /i) to the next 
field U'(x,fi) are the following: 

(a) A momentum Geld ir is generated randomly with probability 
density proportional to exp{ — ^(tt, 7r)}. 

(b) The molecular-dynamics equations are integrated from time 
t = to some later time t = t, taking ir and U as the initial 
values of the fields. 

(c) The new gauge held U' is set to the held U T obtained at time 
t = t through the molecular-dynamics evolution. 

If t is set to a fixed value, as is usually done, the transition probability density corre- 
sponding to the steps (a)-(c) is given by 



where the partition function Z T of the momentum field ensures the correct normal- 
ization and the Dirac 5-function is the one appropriate to the gauge-field integration 
measure. 

It is trivial to check that the transition probability density (2.64) satisfies the 
first of the three conditions listed in Section 2.3.1. The second condition is also ful- 
filled, but some work is required to show this. An important point to note is that the 
molecular-dynamics equations are invariant under time reversal irt, Vt —^r-t, U T -t- 
The molecular-dynamics evolution ttq, Uq — > n T ,U T therefore defines an invertiblc 
transformation of phase space. Moreover, it preserves the Hamilton function and, by 
Liouville's theorem, also the phase space integration measure. It follows from these 
remarks that 



where the second equation is obtained by performing a change of variables from ir, U = 
no, Uq to tt t ,U t . Condition 2 is thus satisfied too. 

For sufficiently small r, the ergodicity of the algorithm (condition 3) can be proved 
as well. The proof is based on an expansion of ir T , U t in powers of r, from which one 
infers that the range of U T includes an open neighbourhood of U = Uq when the initial 
momentum n ~ t:q varies over a neighbourhood of the origin. In view of the non-linear 
nature of the theory, HMC simulations of lattice QCD are however expected to be 
ergodic at any (non-zero) value of r, even if one is unable to show this. Ergodicity can, 




(2.64) 




(2.65) 
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Fig. 2.2 The numerical integration of the molecular-dynamics equations proceeds in time 
steps of size e. In each step, the fields at time t + e are computed from the fields at time t and 
possibly those at earlier times (if a higher-order scheme is used) . The integration rule and the 
step size e should evidently be such that the calculated fields at time t = ne, n = 0, 1, . . . , No, 
closely follow the exact trajectory in phase space. 

in any case, always be rigorously ensured by choosing r € [0, r max ] randomly from one 
update step to the next. 

2.4.3 Numerical integration of the molecular- dynamics equations 

In practice, the molecular-dynamics equations cannot be integrated exactly and one 
must resort to some numerical integration method. As explained in Section 2.4.4, the 
integration error can be compensated by including an acceptance-rejection step in the 
HMC algorithm so that the correctness of the simulation is not compromised. 

The numerical integration proceeds by dividing the time interval [0, r] in No steps 
of size e and by applying a discrete integration rule that gives the correct result in the 
limit e — > (see Fig. 2.2). Considering the Taylor expansions 

7r t+e = 7r t - eF\ u=Ut + 0(e 2 ), (2.66) 

U t+ . =U t + en t U t + 0(e 2 ), (2.67) 

it is clear that acceptable integration schemes can be built from the elementary oper- 
ations 

2b(e): 7T, U -> 7T - eF, U, (2.68) 
Xu(e): n,U -> tt,c^U. (2.69) 

The combinationZo(^£)I(/(£)2<)(5 e )> for example, takes n t , Ut to n t + e , Ut+ € up an error 
of order e 3 . The complete integration from time t = to time t = r then amounts to 
applying the product 

Jo(e,iVo) = {Mle)lu{e)M^)} N ° , * = w> ( 2J0 ) 
to the initial fields. 

The "leap-frog integrator" (2.70) is remarkably simple and has a number of good 
properties. In particular, the integration is reversible, 

M-e,N )Jo(e,N ) = l, (2.71) 

and j7o(e, No) is therefore an invertible mapping of phase space. Starting from the ele- 
mentary integration steps (2.68) and (2.69), it is also trivial to show that the integrator 
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preserves the field integration measure D[7r]D[£/]. Through the numerical integration, 
these important properties of the molecular-dynamics evolution are thus not lost. 

The leap-frog integrator is widely used and appreciated for its simplicity, but there 
are many other integration schemes that can be employed (Leimkuhler and Reich, 
2004; Haircr et al., 2006). In particular, the so-called symplectic integrators all have 
the good properties mentioned above. 

2.4.4 Acceptance-rejection step 

For a fixed step size e, the numerical integration of the molecular-dynamics equations 
normally does not preserve the Hamilton function. In particular, the difference 

AH(n, U) = {H(n T , U T ) - H(tt , U )}^ Uo=u (2.72) 

does not vanish in general. The HMC algorithm (as defined through steps (a)-(c) in 
Section 2.4.2) consequently violates condition 2 if the integration is performed numer- 
ically. It is possible to correct for this deficit by replacing step (c) through 

(c') The new gauge Geld U' is set to the field U T obtained through the 
integration of the molecular-dynamics equations with probability 

PaccKC/) = min{l,c- AH ^}. (2.73) 

Otherwise, i.e. if the proposed held is rejected, U' is set to U. 
The transition probability density of the modified algorithm, 

T(tf ->#') = 4" / D[7r]e-^' r ^{p acc (7r,C/)n^'(^M),^r(^/i)) 

+(1 - PaccK U)) J] 6(U'(x, n), U(x, fi))}, (2.74) 

can then again be shown to have the required properties, for any value of the integra- 
tion step size e, provided the integrator is reversible and measure-preserving. 

The adjustable parameters of the HMC algorithm are then the trajectory length r, 
the integration step size e and further parameters of the integration scheme (if any). 
Evidently, the simulation will be inefficient if the average acceptance rate is low, i.e. if 
the numerical integration is not very accurate. The acceptance rate must however be 
balanced against the computer time required for the simulation, which grows roughly 
linearly with the step number TVo = r/e. In the case of the leap-frog integrator, for 
example, (P aC c) = 1 — 0(e 2 ) and the integration step size is then usually tuned so that 
acceptance rates of 70 — 80 percent are achieved. 

It is more difficult to give a recommendation on the value of r. Traditionally, the 
trajectory length is set to 1, but the choice of r can have an influence on the autocor- 
relation times and the stability of the numerical integration. Some empirical studies 
are therefore required in order to determine the optimal value of r in a given case. 
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2.5 Application to two-flavour QCD 

In QCD with a doublet of mass-degenerate sea quarks, the probability density to be 
simulated is 

p(U) = ie- s W S(U) = S e (U)-\n\detD(U)\ 2 , (2.75) 

where D(U) denotes the lattice Dirac operator in presence of the gauge field U (cf. Sec- 
tion 2.1.1). The HMC algorithm can in principle be used to simulate this distribution, 
but a straightforward application of the algorithm is not possible in practice, because 
the calculation of the quark determinant and of the force deriving from it would require 
an unreasonable amount of computer time. 

2.5.1 Pseudo-fermion fields 

This difficulty can fortunately be overcome using the pseudo-fermion representation 
|det D(U) | 2 = constant x J D[(j>] e~ s ^ u '^, (2.76) 

S pt (U,<f>) = (D(U)- 1 <t>,D(U)- 1 <l>), D[</>] = H d<f> Aa (x)d<f> Ace (x)*, (2.77) 

x,A : ct 

of the quark determinant. The auxiliary field <j>(x) introduced here carries a Dirac index 
A and a colour index a, like a quark field, but its components are complex numbers 
rather than being elements of a Grassmann algebra. For the scalar product in eqn (2.77) 
one can take the obvious one for such fields, with any convenient normalization. Step 
(a) of the HMC algorithm is then replaced by 

(a') A momentum Geld n and a pseudo-fermion Geld 4> are generated randomly 
with probabiHty density proportional to cxp{ — tt) — S p f(U, </>)}. 

Note that the pseudo-fermion action is quadratic in </>. The field can therefore be easily 
generated following the lines of Section 2.1.4. Once this is done, the algorithm proceeds 
as before, where the Hamilton function to be used in steps (b) and (c') is 

H(ir, U) = i(7T, tt) + S g (U) + S pi (U, cj)). (2.78) 

Only the momentum 7r and the gauge field U are evolved by the molecular-dynamics 
equations. The pseudo-fermion field <fi remains unchanged and thus plays a spectator 
role at this point. 

The steps (a'), (b) and (c') implement the transition probability density 
T(U^U') = - 1 — f DHDMc-I*")- 5 ^^) 

Ar-^pf(^) J 

x {P acc (^, U) H 6(U'(x, M ), U t (x, h)) + (1- Pacc^, U)) H 6(U'(x, ft), U(x, ft))}, 

X,fJ, X,fJ, 

(2.79) 

where Z p { (U) is the partition function of the pseudo-fermion field 4> in presence of the 
gauge field U. For simplicity, the dependence of U T and P acc on <f> has been suppressed. 
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Fig. 2.3 Number of floating-point operations required for the generation of 100 statistically 
independent gauge-field configurations in 0(a)-improved two-flavour QCD on a 64 x 32 3 
lattice with spacing a = 0.08 fm. The top curve (Ukawa, 2002) represents the status reported 
at the memorable Berlin lattice conference in 2001, the middle one was obtained a few years 
later, using the so-called domain-decomposed HMC algorithm (Liischer, 2005; Del Debbio 
et ai, 2007), and the lowest curve shows the performance of a recently developed deflated 
version of the latter (Liischer, 20076). 

Starting from this formula, it is then not difficult to show that the algorithm correctly 
simulates the distribution (2.75). 

2.5.2 Performance of the HMC algorithm 

The force F that drives the molecular-dynamics evolution in step (b) of the algorithm 
has two parts, Fq and the first deriving from the gauge action and the other from 
the pseudo-fermion action in the Hamilton function (2.78). In the case of the Wilson 
theory, for example, the forces are 



F£(x,id) = -Retr{T a C/(x,/i)M(x,/i)}, 
F?{x,ii) = -2 Re (kD-^J^DtP), 



(2.80) 
(2.81) 



where M(x, /i) is the staple sum (2.47) previously encountered in the pure gauge theory 
and 

<W,4(1 + l„)U(x, v)- l T a i>{x) - 6 x , y ±(l - ^)T a U(x, ^(x + A). (2.82) 

Note that the computation of the pseudo-fermion force F\ requires the Dirac equation 
to be solved two times. The by far largest fraction of the computer time is then usually 
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Fig. 2.4 Multiple time-step integration schemes divide the integration range [0, r] in a hier- 
archy of intervals of increasing sizes eo, ei, . . . such that et+i is an integer multiple of ex,. 

spent in this part of the simulation program, particularly so at small sea-quark masses 
where the Dirac operator becomes increasingly ill-conditioned. 

Traditionally the performance of QCD simulation algorithms is measured by count- 
ing the number of floating-point operations required for the generation of a sample 
of statistically independent gauge-field configurations. Such performance estimations 
are quite primitive and tend to be subjective to some extent, because the term "sta- 
tistically independent" is only loosely denned in this context. Moreover, the specific 
capabilities of the computers used for the simulation are not taken into account. The 
performance estimates plotted in Fig. 2.3 nevertheless clearly show that the simula- 
tions have become significantly faster since the beginning of the decade. In the following 
sections, some of the now widely used acceleration techniques are briefly described. 

The curves shown in Fig. 2.3 refer to a particular choice of the lattice action and 
the lattice parameters. If the lattice volume V is increased at fixed lattice spacing and 
quark mass, and if the leap-frog integrator is used, the numerical effort required for 
the simulations is known to scale like V 5 ^ 4 . As a function of the lattice spacing a, the 
scaling behaviour of the HMC algorithm is more difficult to determine and is currently 
a debated issue, but there is little doubt that the required computer time increases at 
least like a -7 . 

2.5.3 Multiple time-step integration 

An acceleration of the HMC algorithm can often be achieved using adapted integration 
step sizes for different parts of the force F in the molecular-dynamics equations (Sexton 
and Wcingartcn, 1992). The quark force Ft, for example, tends to be significantly 
smaller than the gauge force Fo and can therefore be integrated with a larger step size 
than the latter. 

If the integration step sizes for the forces Fo and F± are taken to be 

eo = 7Vck' (2 - 83) 
where Nq,N\ are some positive integers, the leap-frog integrators 

Jo(e , No) = {lo{\eo)Iu{eo)M\^)} N ° , (2.84) 

Ji(ei,iVi) = {l l (\e 1 )Jo{eo,No)li{\e 1 )} Nl , (2.85) 

integrate the molecular-dynamics equations from time t to t+ei and t+T, respectively 
(see Fig. 2.4). In these equations, 

Zfc(e) : tt,U ^ tt - eF k ,U (2.86) 
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Fig. 2.5 Divisions of the lattice into non-overlapping blocks of lattice points can be chess- 
board-coloured if there is an even number of blocks along each coordinate axis. The union 
of the sets of points contained in the black blocks is denoted by Q. and its complement (the 
points in the white blocks) by f2* . Their exterior boundaries, dfl and dfl* , consist of all points 
in the other set with minimal distance from the surfaces separating the blocks (open points). 

are the elementary integration steps involving the force Fk- In particular, the applica- 
tion of j7o(eo ; No) consumes relatively little computer time, because a computation of 
the quark force is not required. 

The hierarchical integration is profitable if the step size ei can be set to a value 
larger than eo without compromising the accuracy of the numerical integration too 
much. An acceleration of the simulation by a factor approximately equal to Nq is then 
achieved. 

2.5.4 Frequency splitting of the quark determinant 

The factorization 

IdctDl 2 = dct{DD f + fi 2 } x dct( — DD1 , 

l_ DDT -\- p 

of the quark determinant separates the contribution of the eigenvalues of DD* larger 
than /i 2 from the contribution of the lower ones. In the HMC algorithm, the two factors 
may be represented by two pscudo-fcrmion fields, <p\ and 02, with action 

S p t(U,4>) = (0i, + M 2 )-Vi) + (^2,02 +n 2 (DD^- 1 <p 2 ). (2.88) 

The quark force accordingly splits into two forces, Fx and F 2 , where the first is nearly 
insensitive to the quark mass while the second involves the inverse of DD^ and is, m 
this respect, similar to the force derived from the full quark determinant. 

When such determinant factorizations were first considered (Hasenbusch, 2001; 
Hasenbusch and Jansen, 2003), the main effect appeared to be that the fluctuations of 
the quark force along the molecular-dynamics trajectories were reduced. The integra- 
tion step size required for a given acceptance rate could consequently be increased by a 
factor 2 or so. Urbach et al. (2006) later noted that Fx is the far dominant contribution 
to the quark force at small fi. Using a multiple time-step integrator, and after some 
tuning of /i, an important acceleration of the simulation was then achieved. 

Another factorization of the quark determinant, leading to the DD-HMC algorithm 
(Liischcr, 2005), is obtained starting from a domain decomposition of the lattice like 



(2.87) 
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the ones previously considered in Section 1.3.4. The role of the scale fi that separates 
the high modes of the Dirac operator from low modes is here played by the block sizes, 
which are usually chosen to be in the range from 0.5 to 1 fm or so. 

For the associated factorization of the quark determinant to work out, one needs 
to assume that the lattice Dirac operator has only nearest-neighbour hopping terms, 
as in the case of the (improved) Wilson-Dirac operator, and that the block division 
of the lattice is chessboard-colourable (see Fig. 2.5). With respect to the subset O of 
points contained in the black blocks and its complement fi*, the Dirac operator then 
naturally decomposes into four parts, 

D = D Q + D n . + D d n + D dn , , (2.89) 

where Dn includes all (diagonal and hopping) terms acting inside f2 and Dn' those 
acting in 0*. The hopping terms from the white to the black blocks and the ones going 
in the opposite direction are included in Dgn and Dgn* , respectively. 

Using the fact that f2 and fl* are disjoint sets of lattice points, the factorization 

det D = det D n det D n , det{l - D^D aQ D^D dQ ,}, (2.90) 

may now be derived, where the first two factors further factorize into the determinants 
of the block Dirac operators. The factorization corresponds to a splitting of the quark 
force into an easy high-mode part (the block determinants) and an "expensive" but 
much smaller low- mode part. No fine-tuning is required in this case and an acceleration 
of the simulation is again achieved using a multiple time-step integrator. Moreover, 
the domain decomposition is, as usual, favourable for the parallel processing of large 
lattices 2 . 

2.5.5 Chronological inversion and deflation acceleration 

In the course of the integration of the molecular-dynamics equations, the Dirac equa- 
tion must be solved many times. The exact details depend on which integrator and 
acceleration techniques are used, but the equations to be solved are usually of the form 

Di) = 0, D X = J 5 ip, (2.91) 

where the source field 4> does not depend on the integration time t. 

For small integration step sizes, the gauge field changes only little from one step 
to the next and the solutions ipt and Xt obtained at time t therefore tend to evolve 
smoothly. One may thus attempt to predict them from the previous solutions through 
a polynomial extrapolation in t, for example (Brower et ai, 1997). In general, the 
predicted solutions of the Dirac equation are not sufficiently accurate, but this deficit 
can easily be removed through iterative improvement (Section 1.1.2). The total number 
of iterations performed by the solver program is then often significantly reduced. 

An important further reduction of the solver iteration numbers can be achieved us- 
ing the low-mode deflation method described in Section 1.3 (Luscher, 20076). Since the 

2 An efficient ISO C program implementing the DD-HMC algorithm for two-flavour QCD can be 
downloaded from http : // cern . ch/luscher/DDHMC/ index . html under the GNU Public License (GPL) . 
Many of the acceleration techniques discussed here are included in this package. 
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Fig. 2.6 History of the GCR solver iteration numbers required for the solution of the Dirac 
equation along a molecular-dynamics trajectory, showing the effect of the refreshing of the 
deflation subspace. The data points were obtained at a sea-quark mass of 26 MeV on a 64 x 32 3 
lattice with spacing a = 0.08 fm. On other lattices, the situation is practically the same. 

gauge field changes along a molecular-dynamics trajectory, the (domain-decomposed) 
deflation subspace must be refreshed from time to time in order to preserve its effi- 
ciency. An example illustrating this process is shown in Fig. 2.6. The subspace gen- 
eration at the beginning of the trajectory and the periodic refreshing of the subspace 
requires some computer time, but this overhead is largely compensated by the fact 
that the solutions of the Dirac equations (2.91) are obtained much more rapidly than 
without deflation, particularly so at small quark masses (see Fig. 2.3). 

2.5.6 Improved integrators 

The use of integration schemes other than the leap-frog integrator can be profitable 
if fewer evaluations of the quark force are required for a given integration accuracy. 
Higher-order schemes, for example, where the one-step error is reduced to 0(e 5 ), are 
not difficult to construct (Leimkuhlcr and Reich, 2004; Hairer et al, 2006), but so far 
did not prove to be significantly faster than the leap-frog integrator. 

Another possibility is to look for 0(e 3 ) integrators that minimize the integration 
error according to some criterion (Omelyan et al., 2002, 2003; Takaishi and de For- 
crand, 2006; Clark et al., 20086). An integrator of this kind is, in the case of a single 
time-step integration, given by 

J (e,N Q ) = {X (ie)Z c/ (ie)2 (e - e)l u {\e)I {\e)} N ° , (2.92) 

where e oc e is a tunable parameter. Experience suggests that the optimal values of e/e 
are in the range 0.3 — 0.5, depending a bit on the chosen accuracy criterion. Although 
the quark force needs to be computed two times per integration step, this integrator 
achieves a net acceleration by a factor 1.5 or so with respect to the leap-frog integrator, 
because fewer integration steps are required. 




40 Simulation algorithms 





I 


m 


M 



Fig. 2.7 Typical shape of the density p(X) of the eigenvalues A of the hermitian Wilson-Dirac 
operator Q s near the origin. At lattice spacings a < 0.1 fm, and if the strange-quark mass m s 
is greater or equal to its physical value, the distribution of the eigenvalues with the smallest 
magnitude extends only slightly inside the spectral gap of the density in the continuum limit 
(Del Debbio et at, 2006). 

2.6 Inclusion of the strange quark 

Unlike the light quarks, the strange and the heavy quarks do not pair up in approxi- 
mately mass-degenerate doublets. For various technical reasons, single quarks arc more 
difficult to include in the simulations than pairs and a special treatment is required. 

2.6.1 Strange-quark determinant 

One of the issues that needs to be addressed is the fact that the determinant 

dctD s = ± |det£) s | (2.93) 

of the strange-quark Dirac operator D s may not have the same sign for all gauge-field 
configurations. If chiral symmetry is exactly preserved on the lattice, the determinant 
is guaranteed to be positive, but sign changes from one configuration to another are 
not excluded in the case of the (improved) Wilson-Dirac operator, for example. The 
presence of positive and negative contributions to the QCD partition function poten- 
tially ruins the foundations on which numerical simulations are based. In particular, 
importance sampling ceases to have a clear meaning. 

In all current simulations of QCD that include the strange quark, the regions in 
field space, where the strange-quark determinant is negative, are assumed (or can be 
shown) to have a totally negligible weight in the functional integral. The operator in 
the determinant may then be replaced by the non-negative hermitian operator 

|Qs| = (Q 2 S ) 1/2 , Qs = J 5 D s , (2.94) 

without affecting the simulation results. As explained in the following sections, the so 
modified theory can again be simulated using the HMC algorithm. 

In the Wilson theory, the justification of this procedure rests on the observation that 
the physical strange quark is relatively heavy and that the hermitian Dirac operator 
Qs consequently tends to have a solid spectral gap in presence of the gauge fields that 
dominate the QCD functional integral (see Fig. 2.7). Since the gap tends to widen 
when the strange-quark mass is increased, the sign of the determinant of Q s cannot 
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change and is therefore the same as the one at large masses, where it can be proved 
to be positive. 

2.6.2 Pseudo-fermion representation 

Although well defined, the operator |Q S | is not directly accessible and one is forced to 
use approximations whenever its action on a quark field needs to be computed. In the 
present context, an approximation of its inverse is required, i.e. a tractable function 
R of Q 2 such that |Q s |i? — constant. An exact representation of the strange-quark 
determinant can then be given using two pseudo-fermion fields, 

det \Q S \ = constant x J D[0i]D[0 2 ] e^- 8 ^ 1 ^ (2.95) 

where only the second term in the pseudo-fermion action 

£pf, s = (^i,(|Q s |i*rVi) + (>2,-R02) (2.96) 

is included in the molecular-dynamics Hamilton function. The first term is nearly in- 
dependent of the gauge field and can be taken into account in the acceptance-rejection 
step at the end of the molecular-dynamics evolution. 

So far not many different approximations R were considered. The PHMC algorithm, 
for example, employs a polynomial approximation (Frczzotti and Janscn, 1997, 1999), 
while the RHMC algorithm is based on a rational approximation (Horvath et oil., 1999; 
Clark and Kennedy, 2007). There is also a complex version of the PHMC algorithm, 
where one starts from an approximation of D~ x by a polynomial in D s (Takaishi and 
de Forcrand, 2002; Aoki et ah, 2002). None of these algorithms is completely trivial to 
implement or obviously preferable in view of its simplicity, accessibility to acceleration 
techniques or speed. 

In the following, the RHMC algorithm is worked out as a representative case. The 
steps to be taken in the PHMC algorithm are similar, the main differences being issues 
of approximation accuracy and numerical stability. 

2.6.3 Optimal rational approximation 

The operator norm of the difference of two functions of Q\ is bounded from above by 
the maximal absolute deviation of the functions in the range covered by the eigenvalues 
of Q\. For the construction of optimal approximations of | Q s | 1 , some information on 
the spectral range of Qs i s therefore required as input. 

Since Q 2 nas a solid spectral gap, one can choose some fixed numbers M and e > 
such that the spectrum of Q\ is fully contained in the interval [eM 2 ,M 2 ] with high 
probability (i.e. for most configurations in a representative ensemble of gauge fields). 
Now let 

V ' (y + a 2 )(y + a A )...(y + a 2n ) v ; 

be a rational function of degree [n,n] in y, which approximates the function in 
the range [e, 1]. The operator R(Ql/M 2 ) then approximates M/|Q S | up to a relative 
error (in the operator norm) less than or equal to 
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6= max \l-y/yR(y)\ (2.98) 

if the spectrum of Q 2 is fully contained in [eM 2 ,M 2 ] and a somewhat larger error if 
the spectrum extends slightly beyond the limits of this interval. 

For a specified degree n, one would evidently like the coefficients a r in eqn (2.97) 
to be such that the error 5 is minimized. It seems unlikely that the optimal coefficients 
can be worked out analytically, but the mathematician Zolotarev was able to do that 
a long time ago by relating the optimization problem to the theory of elliptic functions 
(see Achiezer (1992), for example; the coefficients are given explicitly in Section 2.6.5) 3 . 

The optimal rational approximation has a number of remarkable properties. One 
of them is that the error 6 is a very rapidly decreasing function of the degree n. For 
e = 10 -5 , for example, the approximation error is 5 x 10~ 4 if n — 6 and decreases to 
1 x 10~ 7 and 8 x 10~ 15 if n = 12 and n = 24, respectively. The fact that the minimizing 
coefficients satisfy 

ai> a 2 > ... > a 2n > 0, A > 0, (2.99) 
is also very important. In particular, the residues in the expansion in partial fractions, 

R(y)=A\l + ^- + ... + ^—\, (2.100) 

are all positive. The expansion is therefore well suited for the numerical evaluation of 
the action of the operator R(Q 2 / M 2 ) on the pseudo-fermion field cf> 2 . Note that this 
calculation essentially amounts to solving the equations 

(Qs + l4k) V»k = M 2 <j) 2 , [i r = M^, (2.101) 

for k = 1, . . . , n. Since the right-hand sides of these equations are the same, it is possible 
to solve all equations simultaneously using a so-called multi-mass solver (Jegerlehner, 
1996). The total computational effort is then not very much larger than what would 
be required for the sequential solution of the equations that are the most difficult to 
solve (the ones at the largest values of k). 



2.6.4 The RHMC algorithm 

Having specified the operator R, the inclusion of the strange quark in the HMC algo- 
rithm is now straightforward. The algorithm proceeds according to the steps (a'), (b) 
and (c'), as before, and one merely has to add the contribution of the strange-quark 
pseudo-fermion fields. Some specific remarks on what exactly needs to be done may 
nevertheless be useful. 

3 Zolotarev obtained two different optimal rational approximations R(y) to the function one 
of degree [n,n] and the other of degree [n — l,n]. Both are used in lattice QCD and are commonly 
referred to as the Zolotarev rational approximation. 
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(a) Pseudo-fermion generation. The random generation of the strange-quark pseudo- 
fermion fields in the first step requires two operators B and C to be found such that 

BB^ = \Q S \R, and CC ] = FT 1 (2.102) 

(cf. Section 2.1.4). For C one can take the rational function 

C = A -l/2 (Q S + i^)...(Q S + i^n) (21Q3) 
{Qs + ipi) ...{Q s + l(J,2n-l) 

but there is no similarly simple choice for the other operator. However, since 

D 2 R 2 

Z=^r-1 (2.104) 

is of order S, the series 

B = s/M(l + Z) 1 ' 4 = VM {1 + \Z - ±Z 2 + ...} (2.105) 

converges rapidly and may be truncated after the first few terms. The strange-quark 
pseudo-fermion fields can thus be generated with a computational effort equivalent to 
the one required for a few applications of the operator R to a given quark field. 

(b) Strange-quark force. As already mentioned, only the second term of the pseudo- 
fermion action (2.96) is included in the molecular-dynamics Hamilton function. The 
computation of the associated force, 

F s a (x,n) = --^^r 2fc Re(V fc ,Q s 75^-Ds^), (2.106) 

k=l 

requires the n linear systems (2.101) to be solved. The computer time needed for this 
calculation is therefore essentially the same as for one application of the operator R. 

(c) Acceptance step. The acceptance probability is calculated as usual except for the 
fact that the change of the first term of the pseudo-fermion action (2.96), 

(|Q s |i?)-Vi)U^ - (<h, (IQsl^rVi)!^ , (2.107) 

must be added to the difference (2.72) of the molecular-dynamics Hamilton function. 
Note that the action difference (2.107) can be computed by expanding (|Q S 1 <^>i in 
powers of Z. 

2.6.5 Appendix: Coefficients of the optimal rational approximation 

The analytic expressions for the coefficients of the rational function (2.97) that min- 
imizes the approximation error (2.98) involve the Jacobi elliptic functions sn(u, k), 
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cn(u,k) and the complete elliptic integral K(k) (see Abramowitz and Stegun (1972), 
for example, for the definition of these functions) . Explicitly, they are given by 

ar = -^, , = 1,2,..., 2., (2.108) 

where 

k = VT~e, * = (2-109) 
in + 1 

The formulae for the amplitude A and the error 6, 

A= g_ ClC3 --- C2 - 1 , (2.110) 

1 + VI - d 2 c 2 C4 ...c 2n 

d 2 

5= g , (2.111) 

c r = sn 2 (rw,fc), r = l,2,...,2n, (2.112) 

d = fc 2n+1 (cic 3 ...c 2 „_i) 2 . (2.113) 

All these expressions are free of singularities and can be programmed straightforwardly, 
using the well-known methods for the numerical evaluation of the Jacobi elliptic func- 
tions. An ISO C program that calculates the coefficients A, a%,... ,a,2 n and the error 
5 to machine precision can be downloaded from http://cern.ch/luscher/. 
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Variance reduction methods 



Lattice QCD simulations produce ensembles {U\, . . . , Un} of gauge fields, which are 
representative of the functional integral at the specified gauge coupling and sea-quark 
masses. In this chapter it is taken for granted that autocorrelation effects can be safely 
neglected, i.e. that the separation in simulation time of subsequent field configurations 
is sufficiently large for this to be the case. As discussed in Section 2.2.4, the expectation 
value of any (real or complex) observable 0(U) may then be calculated through 

1 N 

<°> = ]v5> (C/fe) + 0(iV ~ 1/2) ' 

k=l 

where the statistical error is, to leading order in 1/N, given by 

= vo{0) = (\0-{0)\ 2 )V\ (3.2) 

In practice, the error is estimated from the variance of the "measured" values (9 (£/&), 
which is a correct procedure up to subleading terms. 

For a given observable O, another observable O' satisfying 

(0') = (0), a (O') « a (O), (3.3) 

can sometimes be found. The desired expectation value is then obtained with a much 
smaller statistical error if O is replaced by O'. Most variance-reduction methods are 
based on this simple observation. The construction of effective alternative observables 
is non-trivial, however, and may involve auxiliary stochastic variables and transforma- 
tions of the functional integral. 

The discussion in this chapter is often of a general nature and applies to most forms 
of lattice QCD, but if not specified otherwise, the Wilson formulation will be assumed, 
with or without 0(a)-improvement and with two or more flavours of sea quarks. 

3.1 Hadron propagators 

The calculation of the properties of the light mesons and baryons is a central goal in 
lattice QCD. Finding good variance-reduction methods proves to be difficult in this 
field, but some important progress has nevertheless been made. In this section, the aim 
is to shed some light on the problem by discussing the statistical variance of hadron 
propagators. 
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Fig. 3.1 Quark-line diagrams contributing to the pion propagator (1), the nucleon propa- 
gator (2) and the pion four-point function (3a and 3b). Directed lines from y to x stand for 
the light-quark propagator S(x, y). At the vertices, the spinor indices are contracted with the 
appropriate colour tensor and Dirac matrix (75 in the case of the pion-field vertices). 

3.1.1 The pion propagator 

Once the sea-quark fields are integrated out, the correlation functions of local fields 
like the isospin pseudo-scalar density 



T a : Pauli matrices, 



(3.4) 



become expectation values of quark-line diagrams. The diagrams are products of quark 
propagators, Dirac matrices and invariant colour tensors, with all Dirac and colour 
indices properly contracted (see Fig. 3.1). 

For any given gauge field, the light-quark propagator is determined through the 
field equation 

DS(x,y) = S xy (3.5) 

and the chosen boundary conditions. The spinor indices have been suppressed in this 
equation, but one should keep in mind that S(x,y) is, for fixed x and y, a complex 
12 x 12 matrix in spinor space. In view of the 75-hermiticity of the Dirac operator, 
forward and backward propagators are related by 



lbS{y,x)^j b = S(x,y)\ 



(3.6) 



where the hermitian conjugation refers to spinor space only. 

In lattice QCD, the pion mass m, is usually determined from the exponential decay 
of the "pion propagator" 



(P a (x)P b (y)) = -U ab (tv{S(x,y)S(x,y^}) 



(3.7) 



at large distances \x — y\. It is advantageous for this calculation to pass to the zero- 
momentum component of the propagator, 



9tt(xo - 2/0) = (O„(x ,y)), 
0^(x a ,y) = ^2tr{S(x,y)S(x,y) 1 }, 



(3.8) 
(3.9) 
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whose asymptotic form at large times, 

gM) oc c"" 1 " 4 + 0(c- 3m ' t ), (3.10) 

t— >oo 

is dominated by a single exponential with exponent equal to the pion mass. 

3.1.2 Statistical error estimation 

For a given gauge field and source point y, the calculation of the propagator S(x,y) 
allows the observable O^^o, y) to be evaluated at all times a?o. The ensemble average of 
Ott(xq, y) then provides a stochastic estimate of the zero-momentum pion propagator, 
from which one may be able to extract the pion mass. 

The statistical error of the average of 0- 7T {xo 1 y) is proportional to the square root 
of the a priori variance 

a (O 2 = (OAxo,y) 2 ) - (OAx ,y)) 2 , (3.11) 

which may symbolically be written in the form 

^'-g{<V>-<0><0>L; 

y y y 

An important point to note here is that the diagram in the first term also contributes to 
the 7T7T propagator (P a (x)P b (x')P a (y)P b (y)} (see Fig. 3.1). Moreover, the second term 
is just the square of the pion propagator. The variance is therefore expected to decay 
like c- 2m ^\ x o-yo\ at large time separations, which implies that the pion propagator is 
obtained with a nearly time-independent relative statistical error. 

Lattice QCD simulations tend to confirm this and they also show that the quark 
propagator typically falls off like 

tx{S{x, y)S{x, yY} 1/2 cx e 'h^-v\ (3 13 ) 

at large distances, for every gauge field in a representative ensemble of fields. There is 
actually little room for a different behaviour of the propagator, since both the mean 
and width of the distribution of tr{S(x, y)S(x, yY} decay exponentially with the same 
exponent. 

3.1.3 Baryons and the exponential SNR problem 

The diagrams contributing to the nuclcon two-point function involve 3 quark propaga- 
tors. At zero spatial momentum and for every gauge-field configuration, the associated 
observable On(xo, y) thus falls off roughly like e~ ^ m ^\ x o-vo\ a t large time separations. 
The nucleon propagator, however, decays much more rapidly, since the exponent (the 
nucleon mass m^r) is significantly larger than \m^. 

In the sum over all gauge fields, there must therefore be important cancellations 
among the measured values of O^ix^^y). More precisely, after averaging over N con- 
figurations, the nucleon propagator is obtained with a signal-to-noise ratio (SNR) 
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proportional to V^e ( mN 2 m ^)\ x o vo\^ The number of measurements required for a 
specified statistical accuracy at a given time separation thus scales like 

N (X e ( 2m N-3tn„)|2io-yo|^ (3-14) 

Note that the exponent 2tojv — 3m T is, in all cases of interest, not small and as large as 
7 fm _1 at the physical point. The calculated values of the nucleon propagator therefore 
tend to rapidly disappear in the statistical noise at large time separations \xq — j/ 1 - 

Computations of other hadron propagators are similarly affected by an exponential 
loss of significance, the only exception being the propagators of the stable pseudo-scalar 
mesons. The masses and decay constants of the latter can thus be determined far more 
easily than those of the vector mesons and the baryons. 



3.2 Using random sources 

As already noted in Section 2.1.3, the link variables in distant regions of a large lattice 
are practically decoupled. Hadron propagators calculated at widely separated source 
points therefore tend to be sampled independently and their average consequently has 
smaller statistical fluctuations than the propagator at a single source point. Averaging 
over as many source points as possible is thus desirable, but tends to be expensive in 
terms of computer time, because the quark propagators must be recomputed at each 
point. 

The random source method performs the sum over all or a selected set of source 
points stochastically (Michael and Peisa, 1998). In many cases, a significant accelera- 
tion of the computation can be achieved in this way. The idea of introducing random 
sources is also quite interesting from a purely theoretical point of view, because it ex- 
tends the notion of an observable to stochastic observables (i.e. functions of the gauge 
field that depend on random auxiliary variables). 

3.2.1 Gaussian random fields 

Random sources may be thought of as a set of additional fields that are decoupled from 
the dynamical fields and therefore do not change the physics content of the theory. In 
the case considered here, the added fields are a multiplct 

r)i(x), i = l,...,N SIC , (3.15) 

of pseudo-fermion fields on the fixed-time spatial lattice. Their action is taken to be 

iV src 

Sstc(v) = ^2iVi,Vi), (3-16) 

i=l 

where the scalar product is the obvious one for such fields. For each gauge-field con- 
figuration in a representative ensemble of fields, the source fields are chosen randomly 
with probability density proportional to e _Ssrc ^'. Evidently, the set of fields obtained 
in this way is a representative ensemble for the joint probability density of the gauge 
field and the source fields. 
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One may now consider observables 0(U,r}) that depend on both the gauge field and 
the sources. When the latter are integrated out, such observables reduce to ordinary 
(non-stochastic) observables. Noting 

(r)i(x)r]j(y)^) aTC = (3.17) 

the integral over the source fields is easily worked out, using Wick's theorem, if 0(U, rj) 
is a polynomial in the source fields. In the case of the observable 



O = -!- EE^^^k^o^), (3-18) 

1=1 x,y 

for example, the calculation yields 

(O)src =Y, tT i S ( X > X )}- ( 3 - 19 ) 

x 

The random sources thus allow the trace (3.19) to be estimated stochastically. 
3.2.2 The pion propagator revisited 

For any fixed time yo, the spinor fields 

&{x,yo) = ^2S(x,y)rn(y), i = 1, . . . , N SIC> (3.20) 
v 

can be computed by solving the Dirac equations D<f>i{x,yQ) = S Xo y r]i(x). It is then 
straightforward to show that the observable 

^(^0,2/0) = AT 1 T/ y~] |<Mz,j/o)| 2 , V 3 : spatial lattice volume, (3.21) 
Jv src K3 *-rz 

has the same expectation value as O^^xq, y) and may therefore be used in place of the 
latter in a calculation of the pion propagator. Note that through the average over the 
source fields, 

(6 n (x , y )) BIC = y 3 J2 f ). ( 3 - 22 ) 

y 

one effectively sums over all source points at time yo. 

Whether the new observable is any better than the old one depends on whether it 
has smaller statistical fluctuations or not. A short calculation, using Wick's theorem, 
shows that the associated a priori variance is given by 



N SIC Vj — — 

x,x' y,y' 



(3.23) 

x' =x ,y' =y 



where O 7r (xo,yo) denotes the volume-averaged observable (3.22). Recalling the expo- 
nential decay (3.13) of the quark propagator, the second term is readily estimated to 
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be of order (7V src m^ Va) -1 at large volumes. Since the first term scales like (m^Vs)^ 1 , 
it tends to be the dominant contribution to the variance already for moderately large 
numbers of source fields (setting N SIC = 12, for example, is often sufficient). 

With respect to a calculation of the pion propagator at a single source point , the 
random source method described here thus achieves a reduction of the statistical error 
by a factor proportional to (rn^Va) -1 ' 2 for approximately the same computational 
cost. The use of random sources is therefore recommended in this case, particularly so 
on large lattices. 

3.2.3 Further applications 

Random sources are an interesting and useful tool. Here only one particular application 
and variant of the method was discussed. Source fields on different subsets of points 
can be considered as well as random fields taking values in a group or, more generally, 
fields with any non-gaussian distribution. The method combines well with low-mode 
averaging techniques (Neff et al., 2001; Giusti et al., 2004; DeGrand and Schacfer, 
2004; Bali et al., 2005; Foley et al., 2005) and, to mention just one further example, it 
also played a central role in a recent calculation of the spectral density of the hermitian 
Wilson-Dirac operator (Giusti and Liischer, 2009). 

In the case of the vector-meson and baryon propagators, the application of random 
source methods is complicated by the fact that the index contractions at the vertices 
of the quark-line diagrams couple different components of the quark propagators. Dif- 
ferent kinds of random sources (one for each component, for example) then need to 
be introduced to be able to write down a correct random-source representation of 
the hadron propagator. The variance of such observables typically involves many dia- 
grams, among them often also disconnected ones. As a consequence, the use of random 
sources for these propagators is not obviously profitable, at least as long as the spatial 
extent of the lattices considered is not significantly larger than 2 or 3 fm. 

Random source methods should preferably be applied only after a careful analysis 
of the variance of the proposed stochastic observables. Such an analysis can be very 
helpful in deciding which kind of random fields to choose and how exactly the stochastic- 
observable must be constructed in order to achieve a good scaling of the statistical 
error with the lattice volume. 

3.3 Multilevel simulations 

Random source methods can lead to an important reduction of statistical errors, but 
are unable to overcome the exponential SNR problem encountered in the case of the 
nucleoli propagator, for example. In a multilevel simulation, the improved observables 
are constructed through a stochastic process, i.e. through a secondary or nested simu- 
lation. Exploiting the locality of the theory, an exponential reduction of the statistical 
error can then be achieved in certain cases. 

So far multilevel simulations have been limited to bosonic field theories, essentially 
because manifest locality is lost when the fermion fields are integrated out. Whether 
this limitation is a transient one is unclear at present, but it is certainly worth ex- 
plaining the idea here and to show its impressive potential. 
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3.3.1 Statistical fluctuations of Wilson loops 

In the following, a multilevel algorithm for the computation of the expectation values 
of Wilson loops in the pure SU(3) lattice gauge theory will be described. The lattice 
action is assumed to be the Wilson plaquette action. 

Similarly to the hadron propagators, Wilson loop expectation values suffer from 
an exponential SNR problem. Let C be a T x R rectangular loop in the (xq, xi)-plane, 
U(C) the ordered product of the link variables around the loop and W = tv{U(C)} its 
trace. For large loops, the expectation value of W satisfies the area law 

(W)~e -CTA , A = TR, (3.24) 

where a ~ 1 GeV/fm. Since W is a number of order 1 for every gauge-field configu- 
ration, the a priori variance cto(W) is practically equal to (|W| 2 ) and thus of order 1 
too. One therefore needs to generate ensembles of at least 

N = (W)~ 2 - c 2aA (3.25) 

configurations to be able to calculate the expectation value (W) to a useful precision. 

For loop areas A equal to I, 2 and 4 fm 2 , for example, the minimal ensemble sizes 
are thus estimated to be 2 x I0 4 , 6 x I0 8 and 4 x I0 17 , respectively. These figures may 
not be exactly right, because the Wilson loop is averaged over all possible translations 
in practice and since there are important subleading corrections to the area law (3.24). 
However, for large areas A, the minimal ensemble size is essentially determined by the 
rapidly growing exponential factor (3.25). 

3.3.2 Factorization and sublattice expectation values 

A multilevel algorithm invented many years ago is the "multihit method" (Parisi et ai, 
1983). In this case, a reduction of the statistical error by an exponential factor with 
exponent proportional to T is achieved by replacing the time-like link variables along 
the Wilson loop through stochastic estimates of their average values in presence of the 
other link variables. The method thus exploits the fact that the Wilson loop factorizes 
into a product of link variables. 

If the loop is instead factorized into a product of two- link operators, an exponential 
error reduction is obtained with an exponent proportional to the area A (Luscher and 
Weisz, 2001). A two-link operator is an object with four SU(3) indices, 

T{x ) a MS = U(x, 0)* ap U(x + Rl, 0) 7 5, (3.26) 

residing at time xo and x = (see Fig. 3.2). Two-link operators at adjacent times can 
be multiplied and their product from time to T — a yields the tensor product of the 
time-like lines of the Wilson loop. The latter is then given by 

W = tr{L(0)T(0)T(o) . . . T(T — a)L(T)}, (3.27) 

where the product 

Hx ) a p = {U(x, l)U(x + al, 1) . . . U(x + (R- a)l, 1)}^ (3.28) 
denotes the spatial Wilson line at time Xq. 
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Fig. 3.2 A T x R Wilson loop in the (xo, £i)-plane with lower-left corner at x = can be 
factorized into the product (3.27) of two-link and line operators. The SU(3) indices of the 
two-link operator (3.26) are assigned as shown in figure (1). The line operators (3.28) coincide 
with the spatial lines of the Wilson loop and have only two indices, as indicated in figure (2). 

Consider now the sublattice bounded by the equal-time hyperplanes at some time 
yo and some later time zo (see Fig. 3.3). The Wilson action couples the link variables 
inside the sublattice (the "interior" link variables) to themselves and the spatial fields 
on the boundaries, but there is no interaction with the field variables elsewhere on 
the lattice. If O is any function of the interior link variables, its sublattice expectation 
value is defined by 

P] = -}- f V[U]intO(U)e- S( > u \ (3.29) 

Ant J 

where one integrates over the interior variables. Note that the part of the action that 
does not depend on the latter drops out in the expectation value, which is thus a 
function of the spatial link variables at time yo and zo only. 

If the lattice is decomposed into non-overlapping sublattices of this kind, the func- 
tional integral divides into an integral over the interior variables and an integral over 
the boundary fields. For even T/a, for example, the Wilson loop expectation value 
may be rewritten in the form 



W = (tr{L(0) [T(0)T(a)] [T(2o)T(3a)] . . . L(T)}) . (3.30) 

The outer expectation value in this expression is the usual one involving an integration 
over all field variables. However, since the observable depends only on the spatial fields 
at time xq = 0, 2a, 4a, . . . , T, the integral over the interior field variables yields the 
product of the sublattice partition functions. This factor exactly cancels the product 
of the normalization factors of the sublattice expectation values. 

The sublattice expectation value of a product of two-link operators is a correlation 
function of two segments of Wilson lines in presence of the boundary fields. A single 
segment transforms non-trivially under the center symmetry of the theory, where all 
time-like link variables at a given time are multiplied by an element of the center of 
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Fig. 3.3 Factorized representations of the Wilson loop expectation value such as (3.30) are 
based on a division of the lattice into sublattices separated by equal-time hyperplanes. The 
sublattice expectation values [. . .] in these expressions involve an integration over the field 
variables residing on the links between the hyperplanes, all other link variables being held 
fixed. 

SU(3). Barring spontaneous symmetry breaking, its sublattice expectation value there- 
fore vanishes and the correlation function of two segments is consequently expected to 
go to zero at large separations R. Experience actually suggests that 

[T(y )T(y + a) . . . T(z - a)} ~ e"^")* (3.31) 

if the time difference zq — yg is larger than, say, 0.5 fm or so. Recalling eqn (3.30), the 
rapid decay of the Wilson loop expectation value at large times T (and the area law 
if eqn (3.31) holds) is thus seen to arise from a product of small factors. 

3.3.3 Multilevel update scheme 

The foregoing suggests that the expectation value of the Wilson loop may be accurately 
calculated, at any value of T, starting from a factorized representation like (3.30). An 
algorithm that implements the idea for this particular factorization proceeds in cycles 
consisting of the following steps: 

(a) Update the gauge field Nq times using a combination of the hcatbath 
and microcanonical link-update algorithms. 

(b) Estimate the expectation values [T(xq)T(xo + a)} by updating the 
held inside the associated sublattices Ni times and by averaging the 
product of the two-link operators over the generated conhgurations. 

(c) Compute the trace tr{L(0)[T(0)T(a)][T(2o)T(3a)] . . . L(T)} using the 
estimates of the sublattice expectation values obtained in step (b). 

Each cycle thus yields an estimate of the trace tr{L(0)[T(0)T(a)] . . . L(T)}. A moment 
of thought shows that these estimates are averages of the Wilson loop over a particular 
set field configurations generated through a valid simulation algorithm. Their average 
over many cycles therefore coincides with the expectation value of the Wilson loop up 
to statistical errors. 

As explained in Section 3.3.2, the trace tr{L(0)[T(0)T(a)] . . . L(T)} tends to decay 
exponentially, for every field configuration in a representative ensemble of fields. The 
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Fig. 3.4 Values of the correlation function of the Polyakov loop on a (T/a) x 12 3 lattice with 
spacing a ~ 0.17 fm and periodic boundary conditions, plotted as a function of the distance 
R at T = 12a (left figure) and as a function of T at 7? = 6a (right figure). 



statistical error of the stochastic estimates of the trace produced by the multilevel 
algorithm (a)-(c) is therefore guaranteed to fall off exponentially as well. With respect 
to a straightforward computation of the Wilson loop expectation value, the sublattice 
averaging thus achieves an exponential error reduction. 

For illustration, some results obtained in the course of an early application of the 
multilevel algorithm are shown in Fig. 3.4. In order to simplify the situation a little 
bit, the correlation function of two Polyakov loops (i.e. Wilson lines that wrap around 
the lattice in the time direction) was considered in this study. The line operators L(xo) 
are then not needed and the observable coincides with the trace of a product of T/a 
two-link operators, T being the time-like extent of the lattice. As is evident from the 
data plotted in the figure, the multilevel algorithm allows the exponential decay of 
the correlation function to be followed over many orders of magnitude, which would 
not be possible (or only with an astronomical computer budget) using the standard 
simulation techniques. 



3.3.4 Remarks and further developments 

Multilevel algorithms of the kind described here have been employed in studies of the 
string behaviour (Luscher and Weisz, 2002; Kratochvila and de Forcrand, 2003; Pepe 
and Wiese, 2009), the glueball spectrum (Meyer, 2003, 2004) and the bulk viscosity 
(Meyer, 2008) in pure gauge theories. Another, closely related multilevel algorithm has 
recently been proposed for the calculation of the energies of the lightest states with 
a specified (non-trivial) transformation behaviour under the exact symmetries of the 
theory (Delia Mortc and Giusti, 2009). In all these cases, an important and sometimes 
impressive acceleration of the simulation was achieved. 

The exponential SNR problem is, however, often not completely eliminated. Com- 
putations of Wilson loop expectation values, for example, require sublattice expecta- 
tion values of products of two-link operators to be determined to some accuracy. In 
view of eqn (3.31), the number of sublattice updates that need to be performed in this 
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part of the calculation is therefore expected to scale roughly like e 2<T ^ z °~ Vo ' R . With re- 
spect to a one-level simulation, the exponential growth of the required computational 
effort is nevertheless dramatically reduced, because the exponent now increases only 
proportionally to the distance R rather than the area A. 



4 



Statistical error analysis 



The estimation of the statistical errors in numerical lattice QCD appears to be an easy 
topic. However, the physical quantities often need to be extracted from the simulation 
data through some non-linear procedure, which may involve complicated fits and ex- 
trapolations. The correct propagation of the errors becomes a non-trivial task under 
these conditions. Resampling techniques such as the jackknifc and bootstrap methods 
(Efron and Tibshirani, 1993) allow the errors of the calculated physical quantities to 
be estimated with minimal effort, but are a bit of magic and are actually known to be 
incorrect in certain special cases (the jackknife method, for example, in general gives 
wrong results for the statistical error of the median of an observable) . 

The aim in this chapter is to build up a conceptually solid framework in which the 
error propagation is made transparent. In particular, the correctness of the jackknifc 
method can then be established for a class of quantities, which includes most cases of 
interest. 

4.1 Primary observables 

Wilson loops, quark-line diagrams and any other function of the gauge field, includ- 
ing the stochastic ones considered in Chapter 3, are the primary observables in lat- 
tice QCD. Their expectation values are the first quantities calculated in a simulation 
project and all physical quantities are eventually obtained from these. 

4.1.1 Correlation functions and data series 

Let A r be some real- valued primary observables labeled by an index r. As before, the 
lattice QCD expectation value of an observable O is denoted by (O). The quantities of 
interest are then the n-point correlation functions (A ri . . . A Tn ) . Independently of any 
locality properties, their connected parts (A ri . . . A rn ) c may be defined in the familiar 
way. In particular, 



{A r ) c 



(4.1) 



(A r A s ) c 



(A r A s ) - (A r )(A s ) 



(4.2) 



(A r A s At) c 



(A r A s A t ) - (A r )(A s A t ) - (A s )(A t A r ) - (A t )(A r A s ) 
+2(A r )(A s )(A t ). 



(4.3) 
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For an arbitrary number n of observables, the relation between the connected and the 
full correlation functions is given by the moment-cumulant transformation (see Section 
4.1.4). 

Suppose now that a representative ensemble of N statistically independent gauge- 
field configuration has been generated in the course of a simulation. The evaluation of 
the primary observable A r on these configurations yields a series 

Or.li a r,2, • ■ • , a r ,N (4-4) 

of "measured" values of this observable, whose average 

1 N 

a r = — ^a rii (4.5) 

i=l 

provides a stochastic estimate of its expectation value (A r ). 

If a Tj i and a Sj i, i = 1, . . . , N, are the data series obtained for the observables A r 
and A s , the data series for the product A r A s is a r ^a s _i. The product is actually just 
another primary observable, even in the case of the stochastic observables considered 
in Section 3.2 if the random sources are included as additional fields in the ensemble 
of fields generated by the simulation. A stochastic estimate of the n-point correlation 
function (A ri . . . A Tri ) is thus obtained by calculating the average of a ri ,i ■ ■ ■ ar n ,i- 

4.1.2 Simulation statistics 

As previously noted in Section 2.2.4, the question by how much a r deviates from (A r ) 
can only be answered statistically when the simulation is repeated many times. In 
the following, the average over infinitely many simulations of any function <j) of the 
measured values of the primary observables is denoted by ((</>)). The standard deviation 
of a r from (A r ), for example, is given by (((a r — (Ar)) 2 )) 1 / 2 . 

Simulation statistics becomes a useful tool when the following two assumptions are 
made. First it must be guaranteed that the measurements of the primary observables 
are unbiased, i.e. that the equality 

((a r ,i)) = (A r ) (4.6) 

holds for all i and all primary observables A r . The second assumption is that the fields 
generated by the simulations are statistically independent. In particular, 

((a r sa s ,j)) = ((a r ,i))((a s ,j)) if i 3- ( 4 - 7 ) 

More generally, the average ((a rit i 1 . . . (Xr„,i n )) factorizes into a product of averages, one 
for each subset of the factors a r ,, with a given value of the configuration index i. As 
discussed in Section 2.2.4, both conditions are fulfilled if the residual autocorrelations 
among the fields in the representative ensembles generated by the simulation algorithm 
are negligible. 

In practice the statistical independence of the measured values should be carefully 
checked by computing the associated autocorrelation functions. However, since the 
latter is estimated from the data and is therefore subject to statistical fluctuations, a 
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reliable determination of the autocorrelation times can sometimes be difficult, partic- 
ularly so if the available data series is not very long. The issue of "calculating the error 
of the error" was addressed by Madras and Sokal (1988) and in greater detail again 
by Wolff (2004) (see also Liischer (2005), Appendix E, for some additional material). 

4.1.3 Distribution of the mean values 

The mean values d r ,a s , . . . of the measured values of the primary observables are cor- 
related to some extent, because the underlying ensemble of gauge- field configurations 
is the same. It is possible to work out the correlations 

N N 

((a ri . . . 6 rk » = — £ ^2 ■ ■ ■ X] (( a ruh ■ ■ ■ a r k ,ik)) ( 4 - 8 ) 

«1=1 «'k=l 

in terms of the correlation functions (A ri . . . A Tn ) . In the case of the two-point corre- 
lation functions, for example, one obtains 

1 N 1 

((a r a s )) = j^J2 K.iO) = ( A r)(A s ) + jj(A r A s ) c , (4.9) 

where use was made of eqns (4.6) and (4.7). 

For any k > 1, the analogous expression for the correlation function (4.8) reads 

k 1 

({^■■•ar fc »=£_n E (A Pl ) c ...(A Pl ) c . (4.10) 

1 = 1 ■ P£V k ,l 

The second sum in this formula runs over the set Vk,i of all partitions P = (Pi, . . . , Pf) 
of the set {1, . . . , k} into / non-empty subsets. Furthermore, 

A Pi =l[A rr (4.11) 

jePi 

Note that the subsets Pi, . . , ,P; are ordered and therefore distinguished. In particular, 
at large N the dominant term is 

((a ri ...a r J = (A ri )...(A rk ) + 0(N- 1 ). (4.12) 

The proof of eqn (4.10) is a bit technical and is deferred to Section 4.1.4. 
The statistical properties of the deviations 

Sa r = a r - (A r ) (4.13) 

can now be determined as follows. First note that eqn (4.9) may be rewritten in the 
form 

((5a r 5a s )) = ^(A r A s ) c . (4.14) 
On average, the magnitude of the deviations Sa r is thus proportional to iV" 1 / 2 . 
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A more detailed characterization of the distribution of the deviations is obtained 
by working out their higher-order correlations. Starting from eqn (4.10), it is possible 
to show (Section 4.1.4) that 

k i 

«*a ri --.*a r *»=Elv*=j]i E (A Pl ) c ...(A Pl ) c , (4.15) 

' =1 ' Pev k ,, 

where Vk.i C "P/u denotes the set of partitions of {1, . . . , k} into / subsets with two or 
more elements. In particular, for all even k 

((Sa ri . ..Sa rk )) = {(A ri A r2 ) c . . . ( J 4 rfe _ 1 vl T . fe ) c + permutations} + . . . , (4.16) 

while for all odd k the leading terms are of order N~( k+X >^ , because the admissible 
partitions contain at least one subset with 3 or more elements. Taken together, these 
results show that the joint probability distribution of the scaled deviations y/N 5a r is 
gaussian to leading order in 1/N, with mean zero and variance (A r A s ) c . 

4.1.4 Appendix: Proof of eqns (4.10) and (4.15) 

Let J r be real- valued sources for the selected primary observables A r . The generating 
function of the correlation functions (A ri . . . A Vn ) is a formal power series 

OO 

Z(J) = 1 + E^E-' • E^ • • • J n ■■■ J r n (4-17) 

n— 1 ' r\ r n 

in these sources. Similarly, the generating function of the connected parts of the cor- 
relation functions is given by 

W ( J ) = E T\ E • • • E^ ■ ■ • A rJcJ ri ■ ■ ■ Jr n - (4.18) 
n— 1 ri r n 

The moment-cumulant transformation is then summarized by the identity 

Z(J)=e w(J) (4.19) 

among formal power series. 

The left-hand side of eqn (4.10) is related to the generating function Z( J) through 

u i d k ((exp{E r ,iar,iJ r })) 

{{a ri . . . a rk 



N k dJ ri . . . dJ rk 



J=0 



9kZ ^ N (4.20) 



N k dJ ri . . . dJ rk 



,7=0 



Use has here been made of the statistical independence of the data a r ^ at different val- 
ues of the index i and of the fact that their products at fixed i are unbiased estimators 



60 Statistical error analysis 



of the correlation functions of the primary observables. The insertion of eqn (4.19) and 
the subsequent expansion of the exponential function e w ( J ) now leads to the formula 



E 



d k w{jy 



N k -Hl d,J ri ...dJ r 



(4.21) 

j=o 



Each derivative in this expression acts on the I factors W( J) . . . W( J) one by one. It is 
then not difficult to convince oneself that the possible distributions of the derivatives 
to the factors match the possible partitions P £ Vk,i, thus proving eqn (4.10). 

The proof of eqn (4.15) proceeds in the same way. In order to obtain the correlation 
functions of the deviations Sa r , it suffices to substitute a r ^ — > a r: i — (A r ) on the right 
of eqn (4.20). The generating function in eqn (4.21) then gets replaced by 

VV(J) = W(J) - J2(A r )J r . (4.22) 

r 

Since at least two derivatives must act on each factor VV(J), the possible distributions 
of the derivatives to the factors now match the partitions P € Pk,i- 

4.2 Physical quantities 

The correlation functions of the primary observables only rarely have an immediate 
physical meaning. In the present context, any well-defined function of the expectation 
values (A r ), {A r A s ), ... is referred to as a physical quantity. Ratios of the expectation 
values of Wilson loops, for example, are considered to be physical quantities as well 
as the heavy-quark potential, which is a limit of such ratios. 

4.2.1 Prom the primary observables to the physical quantities 

In practice, the correlation functions required for the calculation of a physical quantity 
are approximated by the averages of the measured values of the appropriate primary 
observables. A prototype computation of the pion mass m w , for example, starts from 
the data series for the observables C T (a;o,2/) at all times Xq and, say, a single source 
point y (cf. Section 3.1.1). The so-called effective mass 

m c f f (x ) = -a In — (4.23) 

is then computed and fitted by a constant in a sensible range of xo, where the contribu- 
tions of the higher-energy states to the pion propagator are negligible with respect to 
the statistical errors. As long as this condition is satisfied, the fitted constant provides 
a stochastic estimate of the pion mass on the given lattice. 

Once a definite fit procedure is adopted, the so calculated values of the pion mass 
become a complicated but unambiguously defined function of the data series for the 
observables 0- K {xo, y)- It should be noted, however, that this function is not simply a 
function of the average values of these observables. The fit also requires an estimate 
of the covariancc matrix of the latter as input, which one obtains from the available 
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data through the jackknife method or in some other way. Evidently, when fits of fitted 
quantities are considered, as may be the case in studies of the quark-mass dependence 
of the pion mass, the functional dependence of the calculated quantities on the primary 
data series is further obscured. 

4.2.2 Stochastic estimators 

A stochastic estimator of a physical quantity Q is any function <f> of N and the measured 
values of the primary observables A r such that 

Q = lim with probability 1. (4.24) 

N— >oo 

In the following, a class of stochastic estimators with some additional properties will 
be considered. More precisely, it will be assumed that the asymptotic expansion 

oo 

* ~~ y2N- k ^ k \a ri7 a r27 ...) (4.25) 

k=0 

holds, where the coefficients cf>( k ' arc smooth functions of their arguments. The number 
of arguments may grow with k but is required to be finite for all k. 

It may not be obvious at this point why one needs to consider stochastic estimators 
with an explicit dependence on N. The leading coefficient in the expansion (4.25) is in 
fact a valid stochastic estimator for Q which depends only on the averages a r of the 
primary data series. However, when a stochastic estimator is implicitly defined through 
a fit procedure, for example, its leading coefficient at large N may be inaccessible in 
practice. In these cases one does not really have the choice and is forced to deal with 
the operationally well-defined but iV-dependent estimators. 

A simple example of an admissible stochastic estimator is provided by the effective 
mass (4.23). The role of the physical quantity Q is here played by the effective mass 
calculated from the exact pion propagator g n {xo—yo) . More complicated, TV-dependent 
stochastic estimators will be discussed in Section 4.3. 

4.2.3 Bias and covariance matrix 

Let Q a be some physical quantities, labeled by an index a, and <f> a stochastic estimators 
of these. On average, the values of the estimators obtained in a simulation approximate 
the physical quantities up to a deviation given by 

B a = {{54> a )), 54> a = cb a - Q a . (4.26) 

B a is referred to as the bias of the chosen estimators. The statistical fluctuations of 
the measured values are described by the covariance matrix 

C aSi = {(84> a 54>p)) (4.27) 
and are usually significantly larger than the bias. 
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For N — > oo, the bias and the covariance matrix can be expanded in a series in in- 
verse powers of N with coefficients depending on the correlation functions (A ri . . . A Tn ) 
of the primary observables. To show this, first recall that 

a r = (A r ) + Sa r , Sa r = 0(i\T 1/2 ). (4.28) 

The Taylor expansion of the coefficient functions in eqn (4.25) in powers of the devi- 
ations da r then leads to the expression 

r 

+ ^4> a 1} + \ drd s ^5a r 5a s + 0(N- 3 / 2 ) (4.29) 

r,s 

in which 

4>W=<pW((A ri ),(A r2 },...), 9 r = ^. (4.30) 

Evidently, since the leading term (j)^ coincides with Q a , the deviation d(f> a is given 
by the sum of all other terms on the right of eqn (4.29). 

It is important to realize that the only stochastic variables in the expansion (4.29) 
are the deviations 5a r . The averages (4.26) and (4.27) over repeated simulations can 
therefore be computed straightforwardly using the results obtained in Section 4.1. In 
a few lines one then obtains the result 

B a = + 1 1 Y / d r d s ^(A r A s ) c } + 0(N~ 2 ), (4.31) 

r,s 

C aSi = ^ ^9 r ^9 s ^ 0) (A r A s ) c + 0(iV- 2 ). (4.32) 

In particular, the bias of the stochastic estimators is of order TV -1 while their statistical 
fluctuations are of order iV -1 / 2 . 

4.3 Jackknife error estimation 

The bias B a and the covariance matrix C a p usually need to be estimated from the data. 
Such estimates can in principle be obtained via eqns (4.31) and (4.32) by calculating 
the expectation values and two-point functions of the relevant primary observables. 
The coefficient functions and must be known explicitly in this case. 

The jackknife method allows the bias and covariance matrix to be estimated even 
if the coefficient functions are inaccessible or too complicated to be used directly. More 
precisely, the method constructs stochastic estimators for the large- limits of NB a 
and NC a p (which are functions of the expectation values of the primary observables 
and therefore physical quantities). 
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4.3.1 Jackknife samples 

A jackknife sample of the measured values a ri i, . . . , a r< N of a primary observable A r is 
obtained by omitting one measurement from the full series. If, say, the i'th measure- 
ment is omitted, the corresponding jackknife sample consists of the measurements 

(4.33) 

Evidently, there are TV distinct jackknife samples of N — 1 measurements, labeled by 
the number i of the omitted measurement. 

The average of the measurements included in the jackknife sample number i is 
denoted by 

jv 



X — J2 a r,r (4-34) 



a J = — 
r - 1 N 



More generally, a stochastic estimator <f> assumes some value 4>f if the i'th measurement 
of the primary observables is discarded. Note that the jackknife samples are treated 
like any other measurement series of length N — 1 in this context. From the expansion 
(4.25) one then infers that 



jV->oc 



J2(N-l)- k ^(a J ri!i ,a J r2!i ,...), (4.35) 

k=0 

where the coefficient functions (j)^ are the same as before. 
4.3.2 Estimators for the bias and the covariance matrix 

The jackknife estimators for the bias and the covariance matrix are now given by the 
elegant formulae 

jv 

S£ = £«<-*«), (4.36) 

i=l 
N 

CU = " " (4.37) 

i=l 

An important result of the discussion below is going to be that the scaled expressions 
NB'l and NC^ are stochastic estimators in the sense of Section 4.2.2, the associated 
physical quantities being 



Urn NB a = 0« + |V d r dJW(A r A s ) c , (4.38) 

r,s 

hm NC aP = y2d r ^d s $f(A r A s ) c . (4.39) 



r.s 



The bias B a and the covariance matrix C a p are therefore approximated by the esti- 
mators B J a and C£g up to statistical fluctuations of order TV -3 / 2 and further terms of 
order N~ 2 . 
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The expansion of the jackknife estimators (4.36) and (4.37) for TV — > oo is obtained 
straightforwardly by substituting 

Or,i = a r + - (a r - Q r ,t) (4.40) 

in eqn (4.35) and by systematically expanding all terms in powers of TV -1 . In the case 
of the covariance matrix, for example, this leads to the expression 

C U = 4 E W 0) (a ri , . . O&^W • ■ .)(Sr- " SrS.) + 0(N- 2 ), (4.41) 



r.s 



where the notation has been simplified by setting 

iY 



1 E 



9 

a r ,ia S A, <9 r = —- . (4.42) 
TV oa r 

i—l 

The higher-order terms in eqn (4.41) involve averages of increasingly longer products 
of the measured values a r ,i, but are otherwise of the same structure as the leading one. 
Since the corresponding products of the primary observables are primary observablcs 
too, the expansion shows that NC£ g is a stochastic estimator of the kind specified in 
Section 4.2.2. Moreover, proceeding as in Section 4.2.3, the physical quantity approx- 
imated by NC^ B is seen to coincide with the expression on the right of eqn (4.39), as 
asserted above. 

An interesting feature of the jackknife method is the fact that it does not require 
the derivatives of the coefficient function </>(°) in the asymptotic expressions (4.38) and 
(4.39) to be calculated. As is evident from the derivation of eqn (4.41), the method 
effectively performs a numerical differentiation by probing the stochastic estimators 
<j) a through the jackknife samples. The estimators are thus implicitly assumed to vary 
smoothly on the scale of the deviations a J ri — a r — 0(N~ 1 ). 



4.3.3 Error propagation 

The jackknife method is easy to apply, because it requires no more than the evaluation 
of the stochastic estimators of interest for the full sample and the jackknife samples of 
the measured values of the primary observables. Note that ipi = ip(<f>{ jj • • • j V m i) if 
ip is a function of the stochastic estimators </>i, . . . , <f> m . The calculation can therefore 
often be organized in a hierarchical manner, where one proceeds from the primary 
observables to more and more complicated stochastic estimators. 

In the case of the computation of the pion mass sketched in Section 4.2.1, for 
example, the estimation of the statistical errors starts from the jackknife averages 
O^(xo,y)f of the primary observables O ff (xo,y). The bias and covariance matrix of 
the effective mass are then obtained from the jackknife sample values 

m cS {x ) i = -a m^=— -j— (4.43) 

using eqns (4.36) and (4.37). In particular, the jackknife estimator of the covariance 
matrix is given by 
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N 

C x x' = 5Z{^cff(^o); 7 - m off (x )}{m cff (x' )i - m eS {x' )}. (4.44) 

i=l 

The statistical error of m e ff(a;o), for example, is estimated to be (C^f .,. ) 1//2 . 

In the next step, the pion mass m T is computed by minimizing the x 2_ statistic 

ti ti 

X 2 = X] ^ ^ m cs(xo)}[(C J y 1 } XoX > o {m 7T - rricsixo)} (4.45) 

^o=to x' Q =t 

in some range [tojii] of time. The pion mass 

TOtt = - t (4.46) 

[(CJ)-i] 

XoSq 

thus calculated is algebraically expressed through the effective mass and the scaled 
jackknife estimator of the associated covariancc matrix. In particular, since all of these 
are stochastic estimators of the kind specified in Section 4.2.2, the same is the case for 
the pion mass determined in this way. Its statistical error can therefore be computed 
using the jackknife method. 

Following the general rules, the application of the jackknife method requires the 
pion mass to be recomputed for each jackknife sample of the primary data. In partic- 
ular, the covariance matrix of the effective mass must be recomputed, which requires 
the jackknife samples of the jackknife samples to be considered, i.e. samples where a 
second measurement is discarded from the full data series. The computational effort 
thus grows like iV 2 , but the calculation is otherwise entirely straightforward. 

In practice a simplified procedure is often adopted, where the statistical error of the 
covariance matrix is ignored. The matrix is calculated for the full sample in this case 
and is held fixed when the errors of the effective mass arc propagated to those of the 
pion mass. An advantage of this procedure is that the computational effort increases 
like TV rather than TV 2 , but the correctness of the error estimation can only be shown 
for asymptotically large values of N and if the systematic deviation of the effective 
mass from being constant in time is much smaller than the statistical errors. Moreover, 
the bias of the pion mass is no longer correctly given by the jackknife formula (4.36). 
It is therefore advisable to pass to the simplified procedure only after having checked 
its consistency with the results of the full procedure. 
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